algolia / youtube-captions-scraper

Fetch youtube user submitted or fallback to auto-generated captions
249 stars 67 forks source link

remove HTML striptags #7

Open tomByrer opened 5 years ago

tomByrer commented 5 years ago

I removed striptags since I wanted to keep the <b><i><u> formatting. But, I did remove all the <font> tags, since I've usually only seen spammy auto-generated colorization and almost never YouTube caption uploaders bother with font colors. I haven't seen other formatting in captions from the API, so don't have to worry about other markup, including malicious.

BTW, YouTube doesn't use their API nor most formatting from other formats, they seem to use use their own formatting. Any uploads & in-app edits are heavily filtered, unless you use their magic format.

I updated the test to reflect my desires. Also made minor version updates in package.json. API remains the same, but slightly different outupt, so v1.1.0

Haroenv commented 5 years ago

I think this could be considered a breaking change?

tomByrer commented 5 years ago

Breaking, how so? Seems YT already strips all tags in their API except <font> <b> <i> <u>. Maybe you guys need all the tags stripped, I'd like to keep them.

I think it was only bumped to minor version .0.0.1 when HTML stripping was added.

This is not a request that should be merged, but just sharing what I did, like a good open-source user. :)

iam4x commented 5 years ago

Hey 👋

Thanks for the contribution,

I think we should expose this with an option so it doesn't break the existing applications using this module.

tomByrer commented 5 years ago

Do you mean leaving in <b> <i> <u>. as an option? I guess I could, but it would be after I finish my current project.

tomByrer commented 5 years ago

I'm still not sure how to strip all tags while keeping the codebase small as possible.... Maybe I'll make a 'friendly fork' & call it: YouTube-captions-scraper-WTF

Haroenv commented 5 years ago

That makes sense too @tomByrer, keep us updated!

Haroenv commented 4 years ago

@tomByrer, what do you think of taking over the project? I think the most straightforward would be that we deprecate ours and you publish your own?

tomByrer commented 4 years ago

Since Maybe I should just make a PR to remove lodash, you can have a last release for the people who still use this, then archive it with a link to mine?

Haroenv commented 4 years ago

If you make a simple PR just to remove lodash we can definitely review that, but a whole rewrite will take a bit of time, in which case having a totally different fork is easier I think