algolia / youtube-captions-scraper

Fetch youtube user submitted or fallback to auto-generated captions
249 stars 67 forks source link

Add alternative captions retrieval method #8

Open bel0v opened 4 years ago

bel0v commented 4 years ago

Hi! thanks for the lib. captionTracks method doesn't work for some videos, tho. After some investigation I found there is an alternative solution. So I added code to fallback to that.

Tested on this video: https://www.youtube.com/watch?v=62xdACKITrE

It seems that it has to do with modern html5 captions, or sth like that. Cheers!

bel0v commented 4 years ago

Also used an xml parser lib instead of scraping xml by hand, pls check it out.

I'm not sure about striptags and decode that was used for text strings, I've never come across encoded symbols or tags in YT captions. Can you confirm those are needed pls?

Haroenv commented 4 years ago

Are you interested in maintaining this library? We are not currently using it in production, so it is easy to lose track of if it's working correctly

bel0v commented 4 years ago

@Haroenv do you mind if i refactor the whole thing if I do? :)

Haroenv commented 4 years ago

That's fine, knowing that I'm not too sure i'll find time to properly review this