anime-skip / player

Monorepo for the Anime Skip Player (web extension & embedded video player)
GNU General Public License v3.0
13 stars 3 forks source link

Spike out inferring intros from subtitle tracks #201

Open aklinker1 opened 2 years ago

aklinker1 commented 2 years ago

Reducing the number of clicks to contribute is important, and if you're not watching a show that has timestamps yet, it would be nice if the extension could just find and skip the intro for you.

One place we could potentially find the intro location would be the text tracks/subtitles - When there's over 90s of silence, the intro could be hidden there. To increase accuracy, we'll have to use other context clues to narrow that prediction down to be accurate (Previous intro duration, 3rd party services, looking for the show title to mark the end, etc).

For now though, lets just add a setting titled Experimental: Predict Intro Timestamps. When toggled, show possible time ranges that the intro somehow, whatever is easiest (notification, text overlay, etc).

Once we've confirmed the potential for finding intros and we have an idea of its accuracy and how many false positives this results in, we can move onto improving the accuracy. If it doesn't seem promising, we'll scrap it.

logiczsniper commented 2 years ago

Research update 0

Research update 1

It is a bit surprising to me that we cannot get subtitles directly from the platforms; this has the obvious downside that we need to maintain and support each platform individually, but the upside that we do not rely on a 3rd party API that may (or may not) have the subtitle file that we need. So I looked a bit further into how we might get access to the subtitles directly, starting with CR Beta.

The only other anime streaming service I pay for is Funimation, so I moved on to see if I could find subtitles in it's network requests and...

image

This is really great news - I suspect that unless the anime is hardsubbed, we might be able to sniff the network requests and find what we are looking for! Regardless, moved on to a fallback option, that being opensubtitles.

opensubtitles.org

opensubtitles.com

aklinker1 commented 2 years ago

Interesting. I had a feeling getting subtitles wasn't gonna be easy. Microservices aren't out of the question, but it seems like it would be a pain to maintain, just like a JS implementation per site. As of now, the third party APIs seem like the way to go, that right?

What are your thoughts at this point? Still think this is a path worth going down, or is it still too early? I'll defer judgment to you since you've done all the research.

logiczsniper commented 2 years ago

Microservices aren't out of the question

image I'd say we can consider committing to that once we have confirmed that we can deliver some accurate timestamps once we have the subtitles...

third party APIs seem like the way to go, that right?

Yeah I intend on investigating open subtitles more while also trying to find other possibilities.

What are your thoughts at this point? Still think this is a path worth going down, or is it still too early?

I am still hopeful that we can make a great feature out of this with not too much effort, but I'll keep you updated!

Note: I just updated the research comment and am really excited to get your input on it !

aklinker1 commented 2 years ago

Research update 1

Hmm, good to know that they're available at some endpoint via a network request. In chrome extensions, there are 2 ways to read network requests:

  1. The web request APIs. However these have been severely limited by manifest v3's restrictions. Specifically, they can't read responses. Might not be a problem, if we see a request that ends in .vtt and can make the request ourselves and get the captions. https://developer.chrome.com/docs/extensions/reference/webRequest/

  2. Using a script block to inject JS into the page's context and overwrite the fetch and xhr calls. With that we can read the response for every request. That approach doesn't require additional permissions, and if we're already injecting code into the pages that make the request, we don't need to request more host permissions. The downside is that this is difficult to do and can be blocked by pages CSP headers. Not to mention privacy issues with us accessing all the responses for a page's network requests.

Since the first option requires additional permissions, the second approach would be preferred. That said, I'd like to avoid dealing with/intercepting the network requests - it will be difficult to do, hard to maintain, and it will not last long as sites adopt better CSP practices.

I'm not going to maintain that or accept PRs for that, so I'd recommend you stop going down that route. That doesn't mean this still won't work. Is the URL that loads the captions consistent? If so, can we just look that URL up and make the request ourselves?

aklinker1 commented 2 years ago

opensubtitles.org

I'm curious how many anime they have translations for. Would it be possible to compare a sample of episodes that are available on anime skip and see if the data we store on episodes will work with their API and return actual results? So we can get an idea of what percent of episodes this approach would work for?

They request that you contact them before using this feature to write your own software

If it's viable, I'll front the costs and reach out to them. They might be a useful API for my future plans for Anime Skip as well... 😏

logiczsniper commented 2 years ago
  1. The web request APIs.

Is the biggest drawback to this the web request permissions? If so, could we use optional permissions and only ask for the additional permissions if the user enables Experimental: Predict Intro Timestamps?

I ask because this seems to be the easiest way to get the subtitles for every anime that has subtitles available. Tools like yt-dlp try to build the URLs, while we, as the browser extension, can access them. If I am missing some complexity please call me out for it, but it seems like we can use one regex for each supported platform that is looking for the subtitle request, then we make it ourselves. We won't have access to a lot of headers via the web request API, but we only need the URL; it looks like authorization related IDs are passed as query params, e.g. crunchyroll beta URL:

https://v.vrv.co/evs/46e6d2eb853a0ce66af36372c5d1b1a5/assets/55eptj8r3io0rbh_114153.txt?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cCo6Ly92LnZydi5jby9ldnMvNDZlNmQyZWI4NTNhMGNlNjZhZjM2MzcyYzVkMWIxYTUvYXNzZXRzLzU1ZXB0ajhyM2lvMHJiaF8xMTQxNTMudHh0IiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNjQxOTg1NDUwfX19XX0_&Signature=qY6m45FImtZBK8xqFJHPmP28CUEkkxtpXnS8g6SzDhkgIAnvAwy3kpaIyJ5NPNgOHomadisu2DBpqWV2AlcUqcf~VbjH8Z4kMLbmQ7nLKlCWHOVurZu5Wx8Mm9vTWLohfbC7JH-p0r-NBmp2VxC-erIWet4PDIbUZv2De9v~fLcJUhVUnhFomhxqVe20fPxZIsjeDpP69xFU9GRCeMJey2LhevdTHj7vB~l2fk9pcg9As17FMJExcbbFg6s5jcEnkBe082ZJKd~hIo2M4um8ShriHURTY1o20uD6fHhoezegks~7gZ7YtIJaefhS2tjGNbRhXbEmVRkAMuUHVQ6qfw__&Key-Pair-Id=APKAJMWSQ5S7ZB3MF5VA
  1. Using a script block

I totally see your points as to why this is less than ideal, thanks for outlining all this!

I'd recommend you stop going down that route

Just to clarify, is that route = any solution that involves listening to network requests? I still have some hope for option 1, but other than that I am perfectly happy to ditch the network requests ideas entirely!

Can we just look that URL up and make the request ourselves?

Yes they are consistent, but making them is tricky... as you can see for the CR beta above as an example. Funimation is a bit easier than CR, but there are still a couple IDs that I don't know how to get. Even if we figure it all out and are able to build them, they could change in the future and that could be super annoying. In a nutshell, we probably could with a fair amount of effort, but with this option we also get the upside that any episode of anime that has subtitles, we could get.

see if the data we store on episodes will work with their API and return actual results?

If by data we store you mean stuff like anime name, season & episode numbers, I totally get what you mean and my initial response is yes we can get results! We can make something like the below: https://www.opensubtitles.org/en/search/sublanguageid-eng/season-1/episode-2/moviename-attack+on+titan/xml Then parse that response, pick a subtitle file and download.

what percent of episodes this approach would work for?

Interesting, I see how this is useful information and I will try and get some figures for you!

😏

😮