codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
13.89k stars 2.1k forks source link

added new language 'Tamil' (non latin) #979

Closed pj8912 closed 3 months ago

pj8912 commented 5 months ago

Added new language 'Tamil'

AndyTheFactory commented 3 months ago

Hi @pj8912 This project seems to be abandoned. I forked it last year in https://github.com/AndyTheFactory/newspaper4k In the last version we included Tamil too. Can you check if the stopwords list is comparable to yours?

pj8912 commented 3 months ago

hey @AndyTheFactory , seems like it, my code worked great scraping tamil news articles, I will check out newspaper4k.

pj8912 commented 3 months ago

newspaper4k works perfectly fine Screenshot from 2024-03-24 22-18-43