Open EHTaylor12 opened 4 years ago
Yeah seem to be having a similar problem as well. I am trying to extract the text from the example link they gave http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/
but it's not working..
I have a possible work around by downloading/saving the desired webpage using Chrono Download Manager (chrome extension) and than parsing it with newspaper3k.
I am struggling with how to parse locally stored html files with newspaper3k as the read the docs documentation is vague and incomplete in the topic. Does anyone have any detailed guidance they could offer or point me towards a good resource in this subject?
I am running into an issue in which Newspaper3k does not download an article entirely if the article has an image embedded in the middle (usually a chart or graph image) of text.
This is an example article Sample Article in which Newspaper3k stops at the beginning of the graph embedded in the middle of the article.
Has anyone else experienced this or found a solution? I don’t need the images and if there was a way to disable loading images in the settings that would be even better.