duck7000 / imdbGraphQLPHP

5 stars 0 forks source link

Advanced search = genres #41

Closed GeorgeFive closed 3 months ago

GeorgeFive commented 5 months ago

You mentioned advanced search in the caching thread.... this is one thing I would love to see from that: a list of the top X movies by genre, ie,

https://www.imdb.com/search/title/?genres=horror

duck7000 commented 5 months ago

31 i did asked if there was any interest, you probably missed that

First i have to implant advanced search, after that we can discuss what it should do. Imdb does have a advanced search in GraphQL so i have to figure out how that works.

I think it is possible though

duck7000 commented 5 months ago

I started to figure out how advanced search works in GraphQL.. It is a lot!

Here is a list of all possible options, and yes they can all be combined together so you can imagine what the query would look like if we use all options! So using all possible options is a unforgiving task so we have to narrow it down.

All options:

awardConstraint certificateConstraint colorationConstraint (based on color info) creditedCompanyConstraint creditedNameConstraint episodicConstraint explicitContentConstraint filmingLocationConstraint genreConstraint inTheatersConstraint keywordConstraint languageConstraint listConstraint myRatingConstraint originCountryConstraint plotMatchingConstraint rankedTitleListConstraint releaseDateConstraint runtimeConstraint soundMixConstraint titleTextConstraint (Match titles based on their name (title text)) titleTypeConstraint userRatingsConstraint watchOptionsConstraint withTitleDataConstraint

Options that i think would be useful:

titleTypeConstraint (match tv series or movie) creditedNameConstraint (match cast or crew member, works on id only! ) releaseDateConstraint (match from to end year) originCountryConstraint (match specific country) This works with iso 3166 codes.. languageConstraint (match specific language) This works with iso 639 codes.. filmingLocationConstraint (match specific filming location) This works with cities like Paris) genreConstraint (match specific genre) Genres list needed. awardConstraint (match award specifics like award name or wins etc) complicated though!

Besides this the results can be anything you want like title, imdbid etc like in example 3 on this page: https://developer.imdb.com/documentation/api-documentation/sample-queries/search/

@GeorgeFive @All What is your thoughts about this?

duck7000 commented 5 months ago

And all of the above can also be sorted as well..

GeorgeFive commented 5 months ago

watchOptionsConstraint could possibly tie into #40... honestly, that would help a lot with what I would want to do. Get a list of all movies available to stream on a given service, save it to my database, and then have a nice full list per provider while also showing the info on the individual movie page. The only possible issue I'm seeing is that the search options is very incomplete compared to what you can find on title pages... 3 2

I would definitely use genreConstraint... what are the sort options available? I'm not sure what they call it, but I like how they sort by currently popular... it's not an overall "total number of ratings" or anything, it's based on what is trending. Does that look to be possible?

As for the other stuff... more options is always great, but personally, those are the only two I'd likely use for my own use case.

duck7000 commented 5 months ago

Here are the possible sort options:

BOX_OFFICE_GROSS_DOMESTIC
Gross revenue pulled in via box-office in Domestic market for entire lifetime of title.
Domestic refers to North America (U.S., Canada, and Puerto Rico)
ASC: Lower numbers means the title has pulled in less box-office revenue, so poorer performing titles will be first.

METACRITIC_SCORE
Overall Metascore based on critic reviews. Titles without a metascore are
placed at the end when using ASC sort order.
ASC: Lower Metacritic score means the title is rated more poorly, so titles with worse scores will be first.

MY_RATING
Star Rating given by the requesting user.
ASC: Lower star rating means the title the user rated the title more poorly, so most disliked titles will be first.

MY_RATING_DATE
Date when customer rated a title.
ASC: Earlier (older) ratings will be first.

POPULARITY
TitleMeterType.TITLE_METER (aka Pro MOVIEMeter). Score given to non-episodic title types.
ASC: Lower popularity score means that the title is more popular, so the most popular titles will be first.

RANKING
Sort results based on specified ranking algorithm. For the advancedTitleSearch query, exactly one ranked title list
constraint must be specified for using this sort option.
ASC: Higher ranks will be first.

RELEASE_DATE
Earliest wide release date of a title. Titles without a release date are
placed at the end when using ASC sort order.
ASC: Earlier (older) released title will be first.

RUNTIME
The length of the title in terms of runtime.
ASC: Lower runtime means the title is shorter, so shortest titles will be first.

TITLE_REGIONAL
Alphabetical sorting based on regional title text as determined by user language preferences.
Language preference is determined by x-imdb-user-country and x-imdb-user-language headers.
Only supports the languages/regions we support for localized search. Defaults to original title otherwise.
ASC: Lower numbers and letters near the top of the alphabet will be returned first.

USER_RATING
Weighted IMDb Star Rating as determined by users
Note: IMDb maintains a threshold to a minimum number of ratings before it is considered.
ASC: Lower star rating means the title is rated more poorly, so titles with worse ratings will be first.

USER_RATING_COUNT
Count of ratings given by users
Note: IMDb maintains a threshold to a minimum number of ratings before it is considered.
ASC: Lower count of ratings means the title has been rated a fewer number of
times, so titles with least ratings will be first.

YEAR
The recognized year of the title. Typically, the release year, but guidelines are here:
https://help.imdb.com/article/contribution/titles/title-formatting/G56U5ERK7YY47CQB
ASC: Earlier (older) titles will be first.

sortorder is ASC or DESC for all above options

duck7000 commented 5 months ago

The main concern is the input parameters for this function are massive (at least 10 and counting) So i'm thinking about an array as input parameter

duck7000 commented 5 months ago

I will start with genreConstraint, for that there are multiple options:

How many results? Any or all genreid's? Any will find titles with at least one of the provided genres) All will find titles that have All provided genres Input genre

So this would be at least 3 parameters, or 3 config options, question: parameters or config options?

If we / I want more contstraints the number of parameters or config options are growing fast so we have to find a way to do that. Parameters or config options as a array is an option.

GeorgeFive commented 5 months ago

I'd think parameters would be the way to go for this for maximum flexibility. I'd say "All" would be a pretty good default, I'm not sure how much use "Any" would get. If I want horror OR comedy, I provide only one of those genre ids... if I want a horror+comedy hybrid, I provide both of those ids and only get movies containing both genres.

duck7000 commented 5 months ago

Okay i can try that. That will at least limit the amount of parameters, i will use "all" with no parameter then.

But you have to remember that a full list of ALL movies with genre Horror would be a such large results array that php would be out of memory, so that is not possible.

GeorgeFive commented 5 months ago

Oh yeah, definitely. I'm mainly interested in getting the top trending 50-100 movies by genre, similar to the link in the initial post.

duck7000 commented 5 months ago

Well i think that 250 titles is doable, but above 1000 would be complicated. So yes the max should be 250, you can override that with anything less through the parameter. Sorting the results would be more interesting i guess

duck7000 commented 5 months ago

What do you want in the results?

for now there is: Title OringinalTitle yearRange TitleType

But imdb list more like image, rating, plotoutline etc?

Would you mind if those parameters been fixed values? i doubt that anyone else is going to use those, if they do i can add them again. $maxResults = 250, $sortBy = "POPULARITY", $sortOrder = "ASC",

It will save a lot of parameters and for future extensions the parameters are in that case actual input data like year, titleType,genres etc

GeorgeFive commented 5 months ago

In my use case, the only thing I need is the movie imdb id and maybe the title's ranking in the list (#1, #2, etc). I will be using these results along with my existing data to display a list, I just need to know which movies they currently consider to be popular and what ranks where in the chart.

However, I'd say other people using this may want title, year, type, image.... maybe plot.

duck7000 commented 4 months ago

Okay i will add ranking, rating, plot etc like imdb list on their search page, everybody can choose what they need/want to use.

I make those 3 parameters fixed, so for now there is only one parameter but it is easy to extend in the future

duck7000 commented 4 months ago

I'm still working on this.

I've got it working with genres and titletypes, so that is great news. If there are not at least one constraint the results will be empty as it has no point then. I added explicitContentConstraint: { explicitContentFilter: INCLUDE_ADULT } without a parameter so the results includes adult stuff as well.

I'm working on adding more constraints, as it is now it is very limited in using, only above 2 parameters are usable.

I will add releaseDateConstraint and creditedNameConstraint (you can search for a specific actor for example), thinking about adding awardConstraint and locationConstraint (search for a specific place, country etc) bu i'm not convinced if that is useful

duck7000 commented 4 months ago

@GeorgeFive Are you still using php 5.x?

If you still do i have to consider that in my code

GeorgeFive commented 4 months ago

Everything is sounding good so far!

As of right now, yeah, I'm still using php 5.6. I had one huge chunk of vital code that was preventing me from upgrading, but I actually completely rewrote that this past weekend and it should theoretically be ok to upgrade. Once I get everything else rewritten (mainly a lot of mysql = mysqli changes), I should be using a more modern version. I think there was at least one other person using an early version though...?

duck7000 commented 4 months ago

Only thomas douscha uses php 7 as i recall, so i think you are the only one that uses a lower version. But for now that's fine, but if i add releaseDateConstrain i have to somehow check if the provided parameter is a valid date. Since you use 5.6 i found a solution that should work.

But good work to upgrade your code!

GeorgeFive commented 4 months ago

I am now running php 7.1. 8 should run in theory, but I can't test as it's not installed on my server. My os (centos) is horribly out of date and reaching end of life status, which is what started this whole rabbit hole of upgrading everything, hah.

duck7000 commented 4 months ago

It keeps you of the streets (as they say here in the Netherlands)

I'm running Ubuntu server 22.04, if there comes a new version i usually reinstall after the first few point releases Upgrade never seems to work for me, always ended up with a mess

duck7000 commented 4 months ago

In the mean time i did finish the advancedTitleSearch class. I ironed out most of the bugs (i think..) (the releaseDateConstraint was a real pain) and i think it is ready for release.

I will first make a wiki page about the use case and all the options.. There are 5 different constraints, one (adult filter) has no parameter, all others do

It needs to be test so i hope that you will test all functions?

I let it know when it is released

GeorgeFive commented 4 months ago

I used to partner up with a guy on the coding side of things, and he is the one that implemented the login / user system on my site... in oo style (which I'm not very good with), using shmop (which I'm not familiar with at all). It worked great, but compatibility broke after php 5.6, and he had moved on to other things, so I was kind of stuck, haha.

And I will check it out tonight / tomorrow, about to get some sleep before the real world job... yay.

duck7000 commented 4 months ago

advanced title search is now published (latest git commit) so if you have the time check it out. All info is in the wiki page

There might be bugs though so i didn't made a release version jet

I moved those 3 fixed options to config so future version upgrades is easier

duck7000 commented 4 months ago

I just fixed another bug so there might be more i guess.

If you call advancedSearch without any parameters the user will get a random list of titles based on those 3 config options, not useful but i don't know how to avoid that. This situation shouldn't occur if the user uses the method like it should but you never know

There are so many things to consider even with the constraints i used so far.. Others might want even more constraints so we will see how that goes

All info is in the (adjusted) wiki as well as in the doc blocs

duck7000 commented 4 months ago

@GeorgeFive

Did you have any time to consider this issue to the test?

GeorgeFive commented 4 months ago

I actually haven't had time to play with this yet, I'm currently in the process of building a new server for my site. It's never as simple as it should be.....

duck7000 commented 4 months ago

Well no there can be a lot of hurdles on the road, especially as your current server is relatively old (software wise) A new install can be a lot of work and after that you have to test that everything works as it should.

Good luck!

I might change minor things to this method so if you test use the latest commit. If it works as it should i might add more constraints but which one is not sure jet.

duck7000 commented 4 months ago

I added titleTextConstraint (was missing but might be essential) now it is possible to, next to the other parameters, search on specific title text.

The order of the parameters is changed, i put searchTerms as first parameter

duck7000 commented 4 months ago

Question:

If i add language and country constrains would it be necessary to check in this library if the inputted country or language code (like DE for germany) is a valid code? Or should the user check that in his/her application?

The point is that if the inputted code is not valid the search function will error out as the query in that case doesn't work. Edit: this does not seems the case, if the provided language or country code is invalid imdb simply returns a empty object so not a problem.

GeorgeFive commented 4 months ago

Might be a good idea, default to EN if nothing / invalid input is given...

duck7000 commented 4 months ago

I added Country and Language constraints Default is empty string so search uses all countries and languages

If the inputted country or language is invalid imdb GraphQL returns a empty object so no need to check that after all

Default country to US is not useful i guess, better to use all countries in that case

For now i rest my case with not adding any more constraints. If anyone wants a extra constraint that is not jet added let me know.

There is one thing left to do i guess, currently this method uses 8 parameters, that is al lot. so it might be a good idea to convert that to a array with input values? The user will have to provide that array and use it as parameter in the method call ping @GeorgeFive

GeorgeFive commented 4 months ago

Ok, I hate to sound like a dunce, but I'm not making sense of how to call this properly (or I'm doing something terribly wrong). How do you call the search if you want to find all movies with genre = horror (don't care about other parameters)? Example please?

I've read the wiki page and tried numerous things, but everything I've tried returns an empty array. I've even tried filling out every possible parameter with known data to find a specific movie.

duck7000 commented 4 months ago

This is how i call this method, it is just like the normal search. All parameters have default a empty string and php expects parameters in order. So if i want only genre i have to provide the first parameter as well to keep parameters in order. I can change the order but the normal search starts with the searchTerm so i did the same with this method to keep it the same.

$imdb = new \Imdb\TitleSearchAdvanced();
$results = $imdb->advancedSearch("", "Horror");
echo '<pre>';
print_r($results);
echo '/<pre>';

This is a part of my result with genre "Horror" (remember that this has to be a genreId)

Array
(
    [0] => Array
        (
            [imdbid] => 1448754
            [originalTitle] => Thanksgiving
            [title] => Thanksgiving
            [year] => 2023
            [movietype] => Movie
            [rank] => 19
            [rating] => 6.3
            [plot] => After a Black Friday riot ends in tragedy, a mysterious Thanksgiving-inspired killer terrorizes Plymouth, Massachusetts - the birthplace of the infamous holiday.
            [imgUrl] => https://m.media-amazon.com/images/M/MV5BOGZhOGJjZTAtOTJmYS00ZTk2LTgxYWEtNjM3ZmUxMjY2NWFiXkEyXkFqcGdeQXVyNjU2NTI4MjE@._V1_QL75_SY207_.jpg
        )

    [1] => Array
        (
            [imdbid] => 1520211
            [originalTitle] => The Walking Dead
            [title] => The Walking Dead
            [year] => 2010-2022
            [movietype] => TV Series
            [rank] => 26
            [rating] => 8.1
            [plot] => Sheriff Deputy Rick Grimes wakes up from a coma to learn the world is in ruins and must lead a group of survivors to stay alive.
            [imgUrl] => https://m.media-amazon.com/images/M/MV5BNzI5MjUyYTEtMTljZC00NGI5LWFhNWYtYjY0ZTQ5YmEzMWRjXkEyXkFqcGdeQXVyMTY3MDE5MDY1._V1_QL75_SY207_.jpg
        )

    [2] => Array
        (
            [imdbid] => 7216636
            [originalTitle] => Hazbin Hotel
            [title] => Hazbin Hotel
            [year] => 2019
            [movietype] => TV Series
            [rank] => 34
            [rating] => 7.8
            [plot] => In an attempt to find a non-violent alternative for reducing Hell's overpopulation, the daughter of Lucifer opens a rehabilitation hotel that offers a group of misfit demons a chance at redemption.
            [imgUrl] => https://m.media-amazon.com/images/M/MV5BYzNkMzc5OTYtNDk2MS00NGI0LThjZGYtYjdmNWI4OTExZWFjXkEyXkFqcGdeQXVyMjkwOTAyMDU@._V1_QL75_SY207_.jpg
        )

    [3] => Array
        (
            [imdbid] => 9859436
            [originalTitle] => The Walking Dead: The Ones Who Live
            [title] => The Walking Dead: The Ones Who Live
            [year] => 2024-2024
            [movietype] => TV Series
            [rank] => 43
            [rating] => 9
            [plot] => The love story between Rick and Michonne. Changed by a world that is constantly changing, will they find themselves in a war against the living or will they discover that they too are The Walking Dead?
            [imgUrl] => https://m.media-amazon.com/images/M/MV5BYjc3YWM3MjctZDAzNy00OWY4LTkwNjMtMWM2YTg1ZWRlMDAwXkEyXkFqcGdeQXVyMTY3MDE5MDY1._V1_QL75_SY207_.jpg
        )

If you think that there is room for improvement let met know! I'm just a amateur but together we can make it work.

duck7000 commented 4 months ago

In your application you have to create something like IMDb did on their page so users can input the search query

https://www.imdb.com/search/title/?genres=horror

GeorgeFive commented 4 months ago

$imdb = new \Imdb\TitleSearchAdvanced(); $results = $imdb->advancedSearch("", "Horror");

Yep, that's what I thought. I tried that and got an empty array.... I even copied and pasted this and tried it just to make sure I wasn't overlooking a typo, and got the same empty array. Checked to make sure I have the latest version, and same thing.

I did notice it's throwing an error though... PHP Warning: Invalid argument supplied for foreach() in TitleSearchAdvanced.php on line 153

duck7000 commented 4 months ago

I just copied the titleSearchAdvanced class from github just to be sure and this works fine.

That warning comes from $data not is array i guess so the query is wrong and thus your input is wrong i guess?

Are you using the latest git version as the release version might be behind?

GeorgeFive commented 4 months ago

Yep, using the latest version (I downloaded again just to be sure), same thing. I copied and pasted your exact example, print_r the results.... and nothing except Array ( )

duck7000 commented 4 months ago

Weird, i don't know what is going on

Did you use the latest config file? there are 3 config options in there related to titleSearchAdvanced?

GeorgeFive commented 4 months ago

Yep, config is up to date, and the only changes are related to caching. I did disable caching just to see if that was an issue, and no changes.

duck7000 commented 4 months ago

try to debug what is wrong as it works here it must be something at your end

var_dump the input parameters and check if they are correct var_dump the query and check if the input parameters are correctly filled in

Your problem must be in that part

Does the normal search work? it is similar to advanced search

GeorgeFive commented 4 months ago

I've even tried removing the genre and searching by a name or a keyword.... still nothing. Everything is filled in right, I don't get this, haha. It has to be something stupid, but I'm not seeing it.

Normal search works fine, no issues.

    $search = new \Imdb\TitleSearch();
    $results = $search->search($data['main']['moviename']);

I get the expected results.

    $movies = new \Imdb\TitleSearchAdvanced();
    $results = $movies->advancedSearch("", "Horror");

I get nothing.

duck7000 commented 4 months ago

if you comment out use \DateTime;

Maybe your php is not configured with DateTime? i'm guessing here

GeorgeFive commented 4 months ago

DateTime had no effect, still the same thing. I went into the class and did a dump on $data, seems ok, but the foreach isn't doing anything.

string(1152) "query advancedSearch{ advancedTitleSearch( first: 200, sort: {sortBy: POPULARITY sortOrder: ASC} constraints: { titleTextConstraint: {searchTerm: null} genreConstraint: {allGenreIds: ["Horror"]} titleTypeConstraint: {anyTitleTypeIds: []} releaseDateConstraint: {releaseDateRange: {start: null end: null}} creditedNameConstraint: {anyNameIds: []} originCountryConstraint: {anyCountries: []} languageConstraint: {anyLanguages: []} explicitContentConstraint: {explicitContentFilter: INCLUDE_ADULT} } ) { edges { node{ title { id originalTitleText { text } titleText { text } titleType { text } releaseYear { year endYear } meterRanking { currentRank } ratingsSummary { aggregateRating } plot { plotText { plainText } } primaryImage { url } } } } } }"

Going to try an older version and see if that helps, could at least narrow something down....

duck7000 commented 4 months ago

my var_dump($data) starts like this:

object(stdClass)#2405 (1) { ["advancedTitleSearch"]=> object(stdClass)#2404 (1) { ["edges"]=> array(200) { [0]=> object(stdClass)#17 (1) { ["node"]=> object(stdClass)#16 (1) { ["title"]=> object(stdClass)#6 (9) { ["id"]=> string(9) "tt1448754" ["originalTitleText"]=> object(stdClass)#7 (1) { ["text"]=> string(12) "Thanksgiving" } ["titleText"]=> object(stdClass)#8 (1) { ["text"]=> string(12) "Thanksgiving" } ["titleType"]=> object(stdClass)#9 (1) { ["text"]=> string(5) "Movie" } ["releaseYear"]=> object(stdClass)#10 (2) { ["year"]=> int(2023) ["endYear"]=> NULL } ["meterRanking"]=> object(stdClass)#11 (1) { ["currentRank"]=> int(19) } ["ratingsSummary"]=> object(stdClass)#12 (1) { ["aggregateRating"]=> float(6.3) } ["plot"]=> object(stdClass)#14 (1) { ["plotText"]=> object(stdClass)#13 (1) { ["plainText"]=> string(161) "After a Black Friday riot ends in tragedy, a mysterious Thanksgiving-inspired killer terrorizes Plymouth, Massachusetts - the birthplace of the infamous holiday." } } ["primaryImage"]=> object(stdClass)#15 (1) { ["url"]=> string(125) "https://m.media-amazon.com/images/M/MV5BOGZhOGJjZTAtOTJmYS00ZTk2LTgxYWEtNjM3ZmUxMjY2NWFiXkEyXkFqcGdeQXVyNjU2NTI4MjE@._V1_.jpg" } } } }

note the edges is array with 200 elements inside a object If yours look the same than foreach should work

GeorgeFive commented 4 months ago

My mistake, my last post was a dump on $query, not $data.

$data is empty on my end....

object(stdClass)#5 (0) { }

I even tried it in php 8.2 just to see if it was a version issue.... nothing. I feel like this is something incredibly stupid and I'm going to scream when I figure it out, but for now.... UGH

duck7000 commented 4 months ago

I don't use cache and localization may be that is of influence?

Okay empty object so there is probably a problem with your input. var_dump() your input

What if you fill in gernreId directly in the method (so $genre = "Horror") and call the method without parameters?

GeorgeFive commented 4 months ago

Cache and localization turned off... no effect.

$movies = new \Imdb\TitleSearchAdvanced(); $results = $movies->advancedSearch();

Changed in class.... $inputGenres = "Horror";

Nothing.

duck7000 commented 4 months ago

Cache and localization turned off... no effect.

$movies = new \Imdb\TitleSearchAdvanced(); $results = $movies->advancedSearch();

Changed in class.... $inputGenres = "Horror";

Nothing.

That ; does not belong after Horror?

GeorgeFive commented 4 months ago

I placed that directly in TitleSearchAdvanced

line 74 changed to $inputGenres = "Horror";

Also tried line 106 genreConstraint: {allGenreIds: ["Horror"]}

Also tried moving use \DateTime; over to TitleSearch (original), still worked properly.

duck7000 commented 4 months ago

It has to do something with your input been different then mine i guess

I meant in the method parameter

$inputGenres = "Horror"; this does not work, inserted like this in the query is not accepted

var_dump($inputGenres); (after the function check!) It should output ""Horror"" (api query is very picky how it is formatted) maybe it should be escaped, i didn't try