fhamborg / news-please

news-please - an integrated web crawler and information extractor for news that just works
Apache License 2.0
2.09k stars 429 forks source link

Photo in the article stops from reading all text #115

Closed 16egong closed 5 years ago

16egong commented 5 years ago

Hi,

I've been looking for a great news scraper and crawler but I keep finding certain articles do not return all the text. The article I used is below. I had the same issue with the newspaper3k. I was wondering if you new of any workarounds. Thanks!

I'm using python 3.6

url = 'https://www.cnn.com/2019/07/30/middleeast/yemen-market-explosion-saada-intl/index.html'

The above URL returns: CNN) Four children were among 14 civilians killed in an airstrike on Al-Thabet market in Yemen's northern Saada province on Monday, according to Houthi authorities, amid conflicting accounts of what happened. A Houthi-run hospital report, released by spokesman Mohammed Abdul Salam, held the Saudi-led coalition responsible for the incident and said it also wounded 26, including 14 children. In response, coalition spokesman Col. Turki al-Malki told CNN that: "The targeting of Al-Thabet market by the terrorist, Iran-backed Houthi militia is a deliberate attack against innocent civilians." The Saudi-backed Yemeni government's information minister, Moammar al-Eryani, also blamed the explosion on the Houthis in a tweet Monday , and said that the rebels used Katyusha rockets. The United Nations International Children's Emergency Fund (UNICEF) in Yemen said it was "disheartened with reports of the killing and injury of children" in a Twitter post. People injured by an explosion in a market in Yemen's Saada province receive medical attention at the local Al Jomhouri hospital. Read More

fhamborg commented 5 years ago

Unfortunately not. Regarding this functionality, news-please is mostly based on newspaper3k, so it's better if you open an issue on their side.