Next major feature of the comparator framework should be extension of the code with more classes dedicated to examining specific data elements.
Right now the comparator for movies compares six aspects between films: directors, writers, cinema photographers, editors, cast, and keywords, and all of that code resides in a single class.
The next major version of comparator code should be an extensible framework to easily develop tests between common traits, and load those tests dynamically based on the type of content being compared.
The framework should include the possibility that it could be possible to run multiple tests on the same piece of data. Each test should return a score that can be interpreted as a clear value of a perfect match, a partial matcher no match. With certain debugging options, the code should be able to identify each test with a distinct name and the score value.
Consider two movies: Scream and Halloween. With the current comparison code, keywords between the films are examined and compared. However, the film Scream has a direct keyword reference to the movie Halloween, so there could be a comparison test to see if the title of a film in the library is explicitly referenced as a keyword against another film. Also, many essays in the internal library have a lot of film references, so a comparison between films and essays should be considered when building the framework.
The command line comparison tool should have filtering options to mandate a minimal score before a film could be considered a match.
I'm thinking of a code model similar to the validator code, but that isn't necessarily a one to one comparison.
Next major feature of the comparator framework should be extension of the code with more classes dedicated to examining specific data elements.
Right now the comparator for movies compares six aspects between films: directors, writers, cinema photographers, editors, cast, and keywords, and all of that code resides in a single class.
The next major version of comparator code should be an extensible framework to easily develop tests between common traits, and load those tests dynamically based on the type of content being compared.
The framework should include the possibility that it could be possible to run multiple tests on the same piece of data. Each test should return a score that can be interpreted as a clear value of a perfect match, a partial matcher no match. With certain debugging options, the code should be able to identify each test with a distinct name and the score value.
Consider two movies: Scream and Halloween. With the current comparison code, keywords between the films are examined and compared. However, the film Scream has a direct keyword reference to the movie Halloween, so there could be a comparison test to see if the title of a film in the library is explicitly referenced as a keyword against another film. Also, many essays in the internal library have a lot of film references, so a comparison between films and essays should be considered when building the framework.
The command line comparison tool should have filtering options to mandate a minimal score before a film could be considered a match.
I'm thinking of a code model similar to the validator code, but that isn't necessarily a one to one comparison.
See also: #80, #153