cjcodeproj / medialibrary

Python code to read XML media files
MIT License
2 stars 0 forks source link

Search/compare between two place objects #155

Open cjcodeproj opened 10 months ago

cjcodeproj commented 10 months ago

For the class media.data.nouns.Place, there should be a standard search matching function designed to compare two different Place values and identify whether or not they are an exact match, or a partial match.

The Place object contains what is considered a major value, which is the most specific portion of the place, and a minor value which is supplemental data that enhances the major value. The keyordlist tool displays a place keyword by showing the major value, and then appending the minor value in parenthesis.

properNoun/place/Tampa (Florida)              The Punisher
properNoun/place/Griffith Park Observatory (Los Angeles) Devil In A Blue Dress
                                              La La Land
                                              Rebel Without A Cause

If a user is searching for a place keyword, in code, it should ideally be handled by creating a Place object, and then comparing the values between the search parameter object and the target object being searched.

The three entries below are essentially identical, but not exactly identical.

properNoun/place/Los Angeles Airport          Collateral
properNoun/place/Los Angeles Airport (Los Angeles) Once Upon A Time In Hollywood
                                              Speed
properNoun/place/Los Angeles Airport (Los Angeles, California) Into The Night

All 3 are referencing the same place, but the minor details are either missing or different; but they all reference the same place. A match between these values could be considered a possible_exact_match, whereas if all 3 records were 100% identical, they would be considered an exact_match.

However, if the search parameter was for a more wider scope, like "Los Angeles", consider the following values

properNoun/place/Hollywood (Los Angeles)      The Aviator
properNoun/place/Hollywood (Los Angeles, California) Babylon
properNoun/place/Hollywood Sign (Los Angeles) Once Upon A Time In Hollywood
properNoun/place/Los Angeles                  Annie Hall
properNoun/place/Los Angeles (CA)             L.A. Confidential
properNoun/place/Los Angeles (California)     52 Pickup
properNoun/place/Los Angeles Airport          Collateral

The first three entries could be considered a related match, since they are locations within "Los Angeles", and they might be considered relevant (but they do not meet the definition of a match). The next three entries are essentially referencing the same city, expressed in 3 different forms (which would be possible_exact_match), and the 7th match could be considered a string_pattern_match.

Technically, the 5th and 6th entires in the above example should be considered a stronger match to each other as long as the code understands that CA is the abbreviation for California.

The searching algorithm return matches in the following order.

  1. Exact matches
  2. Possible matches
  3. Related matches

Also, most of the location data is expressed as string values. Considering that XML elements like st and cn now support attribute values to handle abbreviation helpers, states and countries should probably be promoted to actual Python objects. (That work could be handled in a different ticket).

This enhancement does not require a proof of concept command line tool to facilitate search operations for the end user; but it could be handled in a different ticket.

This work isn't necessarily related to https://github.com/cjcodeproj/medialibrary/issues/153. This ticket is probably a blocker for the Settings comparison work.