codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.06k stars 2.11k forks source link

Is it possible to use newspapper3k on files? #918

Open IKetchup opened 2 years ago

IKetchup commented 2 years ago

I currently working on a project which constist in extracting informations for repport. I would like to use newspaper for this but as for now I have only seen newspaper used with online article. Is it possible to use newspapper3k on html file or txt files ?

Thanks in advance for any informations about this topic.

johnbumgarner commented 2 years ago

Yes. Please reference the section Extraction from offline HTML files from my Newspaper3K usage document.