RSS-Bridge / rss-bridge

The RSS feed for websites missing it
https://rss-bridge.org/bridge01/
The Unlicense
7k stars 1.02k forks source link

JustWatch Bridge misses some titles #3950

Open evilsh3ll opened 5 months ago

evilsh3ll commented 5 months ago

Describe the bug After some hours that a title is posted in justwatch website, the title is not posted in justwatch bridge rss.

To Reproduce Steps to reproduce the behavior:

  1. Go to JustWatch Bridge
  2. Click on Country:"Italy" and Type:"All"
  3. Go to JustWatch Wesbsite
  4. Find a missing title between the latest titles

Expected behavior All titles of justwatch website should be posted in justwatch rss

Screenshots image

Desktop (please complete the following information):

Additional context n/a

dvikan commented 5 months ago

@Bockiii

Bockiii commented 5 months ago

I just checked and there seems to be some size or amount limitations. The problem is that I didn't add a filter for all the providers (because I didnt bother to add another 70 selection options to the filter and also I think the list is very fluctuating and could produce unwanted behavior if you select a country/provider combination that doesnt exist (because BBC isnt available in guana or something like that).

This leads to the problem that even the "Today" field contains hundreds and hundreds of entries.

Second problem seems to be the "scroll to right" function on providers. So if amazon releases 140 entries on a day, I only get 10 because for the others to load, we would have to scroll. @dvikan do you know of a way how to deal with this in php?

At this point, I don't really know how to fix this because of the limitations above.

Bockiii commented 5 months ago

image

Example for the side scrolling problem. Page says 18 new titles, only displays 10 and thus, only 10 are in source.

evilsh3ll commented 4 months ago

Maybe it could be useful a filter with just the 4-5 major providers: netflix, prime, disney, apple, paramount, crunchyroll. That are what ~99% of users need.

Bockiii commented 4 months ago

Alternative would be a free text field with ab explanation of what to put in there. You can define the providers and then copy paste them from the url.

Full alternative would be to just change the whole bridge to just picking up a provided link and then just scraping from there. Although that would also not help with the people that just pastel the link to the full "new" page.

Not sure on how to proceed

Bockiii commented 4 months ago

I found that justwatch has a graphQL api but its not really documented. I found these sources for some information, if someone wants to take a swing:

https://www.reddit.com/r/webscraping/comments/wacb8a/how_to_scrape_graphql_endpoint_with_requests/

Bockiii commented 4 months ago

I've been looking at this again and getting "all" just doesn't make any sense. Just today there were these additions: "Amazon Vault History Channel": 341 "History Vault Apple TV": 84 "Acorn TV Apple TV": 95 "Microsoft Store": 99 "Eventive": 170

Those 5 alone mean 789 rss feed entries, just for today.

How about this: I'll add the top 10/top 15 of providers and if someone actually requests "Eventive", we can check this again?

Bockiii commented 4 months ago

Ah forget it, even that is annoying af. Paramount+ is available as "Paramount+", "Amazon Channels Paramount+", "Apple TV Channels Paramount+" and so on. And that's just for the US page, so for italy, there could be either none of those or a fourth "Channel" type etc.

This page just doesn't really fit the "select from a list" type of bridge. Maybe a "paste the link you want" would be better or maybe even removing it completely and replacing it by an xpath bridge how-to or so...

dvikan commented 3 months ago

unclear to me the solution here.