cjcodeproj / medialibrary

Python code to read XML media files
MIT License
2 stars 0 forks source link

Derivative content title objects may benefit from refactoring. #145

Closed cjcodeproj closed 10 months ago

cjcodeproj commented 11 months ago

The title of a content object is manipulated in multiple ways.

Film Title Sort Title Unique ID Title Catalog Title
V. I. Warshawski vi_warshawski vi_warshawski-1991-1 V.I. Warshawski (1991)
The Whole Nine Yards whole_nineyards+the whole_nineyards+the-2000-1 The Whole Nine Yards (2000)
3:10 To Yuma three_ten_to_yuma three_ten_to_yuma-1957-1 3:10 To Yuma (1957)

It may make sense to re-factor this code to streamline it after the completion of https://github.com/cjcodeproj/medialibrary/issues/144

cjcodeproj commented 11 months ago

Note: Ticket https://github.com/cjcodeproj/medialibrary/issues/144 is closed.

cjcodeproj commented 10 months ago

Coding Notes:

There is sorting code within the Title object code and within the Catalog object code. The title object may need the data for internal sorting, but most sort operations actually take place at the content object level with a respective index object.

To be clear, the MovieIndexEntry object is an object that contains sorting into on a Movie, but it is not an attribute variable of the Movie object; it is created by the build_index_object method, then maintained externally and populated with data within the Movie object (mostly title and catalog info).

The sorting code inherent in the Title object class probably isn't even being used anywhere.

Gonna be careful to make sure I'm not coding in circles.

cjcodeproj commented 10 months ago

Commit:

https://github.com/cjcodeproj/medialibrary/commit/24131e4c1030687d93fe8b6927c8938bfd35f33c

cjcodeproj commented 10 months ago

Closing Notes

Code Cleanup

Cleaned up a bunch of code that resided directly under the Movie object class. Some of it was passed to the parent object, AbstractContent. Most of the manipulation code has been put into a different class called TitleMunger

Weirdness with the Unique Key

In the opening note for this ticket, I had this example.

Film Title Sort Title Unique ID Title Catalog Title
The Whole Nine Yards whole_nineyards+the whole_nineyards+the-2000-1 The Whole Nine Yards (2000)

For the purpose of the unique_key value, it doesn't make sense to shift the "the" article in the title over to the end; we don't need to sort based on the value, we just need a simple guarantee of uniqueness by combining the title, the year, and an integer value in case there is the rare occurrence of identical titles in the same year. If anything, keeping the article in front of the title probably improves readability of the value. (Not that end users will be regularly reading the sort_title or unique_key values directly).

There should be a future effort/ticket to consider changing the value of unique_key.

Weird Sorting Code in Content Object

There's legacy code in the Movie content object that actually allows sorting Movie objects based on the unique_id. (I'm talking about the __gt__ and __lt__ methods). There are even unit tests to verify that these work.

This code isn't doing anything, and it should probably be removed.

Extra attributes in the Title object

The media.data.media.content.generic.catalog.Title object has extra attributes for an internal sort title, and a filename title. I'm not sure if those values are even referenced anywhere else in the code.

Should the unique index value be padded?

Does it make sense to add padding to the index portion of the unique_key? IE, the_whole_nine_yards-2000-1 becomes the_whole_nine_yards-2000-001? Technically, it only adds memory, and so far I haven't seen a case of ten moves in a single year with the same title; but it could be useful down the road when more content types (like music) come into play.

Post XML reading processing

The AbstractContent() class now has a _post_load_process() method designed to perform operations after the XML content has been read.

It may make sense to develop formal methods for object initialization in this order.

  1. Processing before XML data is read
  2. Reading of XML data
  3. Processing after XML data is read

Will investigate further.

cjcodeproj commented 4 months ago

Update

Should the unique index value be padded?

I still haven't seen a case where there are ten movies in the same given year with the exact same title.

However, there are five movies released in 2023 that are named Quicksand

Movie Year Cast
Quicksand 2023 Carolina Gaitan, Allan Hawco
Quicksand 2023 Tanner Presswood, Simon Elias
Quicksand 2023 Mallory Adams, Kayode Akinyemi
Quicksand 2023 Ron Bottitta, Clay Boulware
Quicksand 2023 Torrey Gilchrist, Beccy Myers

Three of these films are technically shorts, but the unique_key doesn't differentiate. If we keep unique_key around, it may make sense to add more data to it, like country of origin, which might add some more randomness.