Open aserlich opened 10 years ago
I noticed these - looks like there is some minor variations in the html produced for these sites. I've updated the scrapers, will go back and re-download those articles.
Adi
On 13 April 2014 18:04, aserlich notifications@github.com wrote:
Looks like a .mobi problem. Could you investigate?
2014-04-12 01:26:28,921 - main - WARNING - Missing text from http://springsadvertiser.co.za/89335/da-gives-solar-units/ [in scraper.py:27] publish_date = date_parser.parse(divs[3].text) 2014-04-12 05:26:32,883 - main - WARNING - Missing text from http://www.kougaexpress.co.za/137421/news-details/ [in scraper.py:27] 2014-04-12 05:26:32,939 - main - WARNING - Missing text from http://www.kougaexpress.co.za/137687/news-details/ [in scraper.py:27] return json.loads(self.text, _kwargs) return json.loads(self.text, _kwargs) 2014-04-12 09:21:20,386 - main - WARNING - Missing text from http://24.com.feedsportal.com/c/33816/f/607927/s/39421f5c/sc/40/l/0L0Schannel240Bco0Bza0CTV0CNews0COscar0Etrial0ETV0Echannel0Elifts0EDStv0E20A140A411/story01.htm [in scraper.py:27] 2014-04-12 09:21:20,414 - main - WARNING - Missing text from http://24.com.feedsportal.com/c/33816/f/607927/s/39421f5c/sc/40/l/0L0Schannel240Bco0Bza0CTV0CNews0COscar0Etrial0ETV0Echannel0Elifts0EDStv0E20A140A411/story01.htm [in scraper.py:27] 2014-04-12 09:21:47,497 - main - WARNING - Missing text from http://ballitofever.mobi/news/read/556/woman-shot-in-taxi-rank-violence [in scraper.py:27] 2014-04-12 09:21:47,663 - main - WARNING - Missing text from http://ballitofever.mobi/news/read/556/woman-shot-in-taxi-rank-violence [in scraper.py:27] 2014-04-12 09:24:27,446 - main - WARNING - Missing text from http://www.carletonville.mobi/news/read/3238/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:25:36,336 - main - WARNING - Missing text from http://www.coastalweekly.mobi/news/read/751/woman-shot-in-taxi-rank-violence [in scraper.py:27] 2014-04-12 09:31:18,163 - main - WARNING - Missing text from http://www.eastlondonexpress.mobi/news/read/762/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:31:26,976 - main - WARNING - Missing text from http://www.eastlondonexpress.mobi/news/read/760/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:32:14,568 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2481/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:32:15,365 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2481/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:32:23,068 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2511/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:32:28,667 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2509/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:32:28,753 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2509/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:34:34,934 - main - WARNING - Missing text from http://www.hermanustimes.mobi/news/read/2842/bejaarde-beseer-in-motorongeluk [in scraper.py:27] 2014-04-12 09:35:21,545 - main - WARNING - Missing text from http://www.isoexpress.mobi/news/read/236/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:35:25,671 - main - WARNING - Missing text from http://www.isoexpress.mobi/news/read/235/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:35:26,984 - main - WARNING - Missing text from http://www.isoexpress.mobi/news/read/235/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:36:32,571 - main - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1543/having-fun-at-creche [in scraper.py:27] 2014-04-12 09:36:37,729 - main - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1538/neem-deel-aan-wedstryd [in scraper.py:27] 2014-04-12 09:36:38,111 - main - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1538/neem-deel-aan-wedstryd [in scraper.py:27] 2014-04-12 09:36:38,283 - main - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1537/stel-ondersoek-na-lekkende-pyp-in [in scraper.py:27] 2014-04-12 09:37:07,851 - main - WARNING - Missing text from http://www.pmbfever.mobi/news/read/1161/giving-hope-to-patients [in scraper.py:27] 2014-04-12 09:37:18,277 - main - WARNING - Missing text from http://www.pmbfever.mobi/news/read/1146/twc-tennis-achievers [in scraper.py:27] 2014-04-12 09:37:37,989 - main - WARNING - Missing text from http://www.mthathaexpress.mobi/news/read/344/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:37:40,211 - main - WARNING - Missing text from http://www.mthathaexpress.mobi/news/read/343/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:38:05,475 - main - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/355/cycling-for-a-worthy-cause [in scraper.py:27] 2014-04-12 09:38:06,538 - main - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/356/football-academy-for-mogwase [in scraper.py:27] 2014-04-12 09:38:11,245 - main - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/343/strydom-grooming-young-boxer [in scraper.py:27] 2014-04-12 09:38:12,172 - main - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/342/football-club-transforming-maboloka [in scraper.py:27] 2014-04-12 09:39:05,918 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6290/bakkie-tol-op-n1 [in scraper.py:27] 2014-04-12 09:39:13,487 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6292/talle-gesteelde-goedere-gevind [in scraper.py:27] 2014-04-12 09:39:17,208 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6297/riana-nel-by-simonsvlei [in scraper.py:27] 2014-04-12 09:39:17,378 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6296/ub40-on-stage [in scraper.py:27] 2014-04-12 09:39:21,204 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6299/chris-chameleon-by-ou-meul [in scraper.py:27] 2014-04-12 09:39:24,636 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6301/gim-se-hokkie-leerders-skitter [in scraper.py:27] 2014-04-12 09:40:24,041 - main - WARNING - Missing text from http://www.parysgazette.mobi/news/read/2349/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:41:12,646 - main - WARNING - Missing text from http://www.peoplespost.mobi/news/read/8143/teen-shot-in-manenberg [in scraper.py:27] 2014-04-12 09:41:23,731 - main - WARNING - Missing text from http://www.peoplespost.mobi/news/read/8133/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 09:41:24,810 - main - WARNING - Missing text from http://www.peoplespost.mobi/news/read/8133/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 09:42:24,551 - main - WARNING - Missing text from http://www.peexpress.mobi/news/read/3801/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:42:30,593 - main - WARNING - Missing text from http://www.peexpress.mobi/news/read/3796/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:43:37,791 - main - WARNING - Missing text from http://www.potchherald.mobi/news/read/4641/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:45:21,890 - main - WARNING - Missing text from http://www.sedibengster.mobi/news/read/2294/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:46:21,874 - main - WARNING - Missing text from http://stangerweekly.mobi/news/read/1021/woman-shot-in-taxi-rank-violence [in scraper.py:27] 2014-04-12 09:46:57,297 - main - WARNING - Missing text from http://www.sunshinecoastexpress.mobi/news/read/483/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:47:01,596 - main - WARNING - Missing text from http://www.sunshinecoastexpress.mobi/news/read/482/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:47:11,160 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3793/demi-is-kwaai [in scraper.py:27] 2014-04-12 09:47:11,161 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3792/onse-charlize-en-berk-is-in-die-kaap [in scraper.py:27] 2014-04-12 09:47:11,301 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3792/onse-charlize-en-berk-is-in-die-kaap [in scraper.py:27] 2014-04-12 09:47:11,307 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3793/demi-is-kwaai [in scraper.py:27] 2014-04-12 09:47:13,964 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3794/jokes-10-apr-2014 [in scraper.py:27] 2014-04-12 09:47:13,993 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3794/jokes-10-apr-2014 [in scraper.py:27] 2014-04-12 09:51:26,606 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2558/my-man-wil-n-threesome-he [in scraper.py:27] 2014-04-12 09:51:26,799 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2558/my-man-wil-n-threesome-he [in scraper.py:27] 2014-04-12 10:02:55,082 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27] 2014-04-12 10:02:55,186 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27] 2014-04-12 10:02:55,209 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27] 2014-04-12 10:02:55,731 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27] 2014-04-12 10:05:44,086 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4612/welgemoedsafe-has-new-ops-room-and-ops-manager [in scraper.py:27] 2014-04-12 10:05:44,387 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4612/welgemoedsafe-has-new-ops-room-and-ops-manager [in scraper.py:27] 2014-04-12 10:05:44,952 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4614/measles-polio-vaccine-stock-shortfall [in scraper.py:27] 2014-04-12 10:05:45,270 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4613/centres-for-city-s-homeless-making-inroads [in scraper.py:27] 2014-04-12 10:05:46,944 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4613/centres-for-city-s-homeless-making-inroads [in scraper.py:27] 2014-04-12 10:06:01,897 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4586/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 10:06:02,354 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4586/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 10:06:02,706 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4586/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 10:06:29,662 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1838/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:06:31,416 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1838/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:06:33,396 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1829/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 10:06:34,549 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1829/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 10:07:15,598 - main - WARNING - Missing text from http://www.uvoexpress.mobi/news/read/209/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:07:16,577 - main - WARNING - Missing text from http://www.uvoexpress.mobi/news/read/209/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:07:17,982 - main - WARNING - Missing text from http://www.uvoexpress.mobi/news/read/208/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 10:08:23,496 - main - WARNING - Missing text from http://www.vaalweekblad.mobi/news/read/9045/scubi-forester-revisited [in scraper.py:27] 2014-04-12 10:09:33,400 - main - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27] 2014-04-12 10:09:33,399 - main - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27] 2014-04-12 10:09:35,132 - main - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27] 2014-04-12 10:09:35,136 - main - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27] 2014-04-12 10:11:45,531 - main - WARNING - Missing text from http://udnews.mobi/news/read/1838/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:11:47,228 - main - WARNING - Missing text from http://udnews.mobi/news/read/1829/yes-dewani-s-only-word-to-court [in scraper.py:27] return json.loads(self.text, kwargs) 2014-04-12 19:27:33,714 - main - WARNING - Missing text from http://ladysmithgazette.co.za/17439/n3tc-arthur-cresswell-memorial-marathon/ [in scraper.py:27] 2014-04-12 21:27:08,702 - main - WARNING - Missing text from http://mpumalanganews.co.za/141709/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 21:27:11,246 - main - WARNING - Missing text from http://lowvelder.co.za/157569/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 22:26:52,697 - main - WARNING - Missing text from http://corridorgazette.co.za/125205/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 22:27:00,731 - main - WARNING - Missing text from http://nelspruitpost.co.za/131021/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 22:27:07,832 - main - WARNING - Missing text from http://whiteriverpost.co.za/115981/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 23:28:24,726 - main - WARNING - Missing text from http://www.news24.com/World/News/3-charged-over-Hollande-affair-actress-photo-20140412 [in scraper.py:27] 2014-04-13 01:27:03,594 - main** - WARNING - Missing text from http://springsadvertiser.co.za/89337/hulle-speel-juksei-by-sas/ [in scraper.py:27]
Reply to this email directly or view it on GitHubhttps://github.com/Code4SA/various-scrapers/issues/9 .
Adi Eyal Director Code for South Africa Promoting informed decision-making
phone: +27 78 014 2469 skype: adieyalcas linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal web: http://www.code4sa.org twitter: @soapsudtycoon
For more information on how to participate in the open data community in South Africa, go to: http://www.code4sa.org/#community
Is this still a problem? Seeing more of these coming through the error logs
2014-04-16 12:32:15,093 - __main__ - WARNING - Missing text from http://southcoastherald.co.za/35109/hibberdene-crash-injures-seven/ [in scraper.py:27]
2014-04-16 13:26:52,694 - __main__ - WARNING - Missing text from http://benonicitytimes.co.za/173425/meet-the-missing-snake/ [in scraper.py:27]
2014-04-16 13:48:49,115 - __main__ - WARNING - Missing text from http://www.udnews.mobi/news/read/1855/one-day-without-shoes-on-april-29 [in scraper.py:27]
2014-04-16 14:09:09,729 - __main__ - WARNING - Missing text from http://udnews.mobi/news/read/1855/one-day-without-shoes-on-april-29 [in scraper.py:27]
2014-04-16 14:19:12,680 - __main__ - WARNING - Missing text from http://www.peexpress.mobi/news/read/3847/one-day-without-shoes-on-april-29 [in scraper.py:27]
2014-04-16 16:26:47,647 - __main__ - WARNING - Missing text from http://alexnews.co.za/24414/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:26:52,330 - __main__ - WARNING - Missing text from http://citybuzz.co.za/13536/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:27:07,992 - __main__ - WARNING - Missing text from http://fourwaysreview.co.za/172361/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:27:19,389 - __main__ - WARNING - Missing text from http://midrandreporter.co.za/107181/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:27:21,046 - __main__ - WARNING - Missing text from http://nelspruitpost.co.za/155908/bergies-en-primary-se-die-stryd-aan/ [in scraper.py:27]
2014-04-16 16:27:41,821 - __main__ - WARNING - Missing text from http://northeasterntribune.co.za/134239/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:28:04,711 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/roodies/ [in scraper.py:27]
2014-04-16 16:28:07,016 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/soccer/ [in scraper.py:27]
2014-04-16 16:28:17,238 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/florries-5/ [in scraper.py:27]
2014-04-16 16:28:18,069 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/havies/ [in scraper.py:27]
2014-04-16 16:28:20,881 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/die-ruiter/ [in scraper.py:27]
2014-04-16 16:28:37,657 - __main__ - WARNING - Missing text from http://rosebankkillarneygazette.co.za/134572/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:28:40,838 - __main__ - WARNING - Missing text from http://sandtonchronicle.co.za/90139/oscar-trial-five-things-need-know-day-24-2/ [in scraper.py:27]
2014-04-16 17:28:24,968 - __main__ - WARNING - Missing text from http://northcoastcourier.co.za/16820/curro-excels-hockey/ [in scraper.py:27]
2014-04-16 21:28:32,469 - __main__ - WARNING - Missing text from http://www.franco-sa.co.za/news/486 [in scraper.py:27]
2014-04-17 01:54:42,846 - scrapers.naspers.parser - ERROR - Could not download: http://www.kalaharibulletin.mobi/news/read/2994/video-snotkop-wys-gesig [in /var/www/scrapers/scrapers/naspers/parser.py:122]
I'm working on it. There seem to be two issues. Firstly, stories with only an image caption. That is now fixed. Secondly, stories which are just bullet points. I'm working on that today.
Adi
On 17 April 2014 08:05, aserlich notifications@github.com wrote:
Is this still a problem? Seeing more of these coming through the error logs
2014-04-16 12:32:15,093 - main - WARNING - Missing text from http://southcoastherald.co.za/35109/hibberdene-crash-injures-seven/ [in scraper.py:27] 2014-04-16 13:26:52,694 - main - WARNING - Missing text from http://benonicitytimes.co.za/173425/meet-the-missing-snake/ [in scraper.py:27] 2014-04-16 13:48:49,115 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1855/one-day-without-shoes-on-april-29 [in scraper.py:27] 2014-04-16 14:09:09,729 - main - WARNING - Missing text from http://udnews.mobi/news/read/1855/one-day-without-shoes-on-april-29 [in scraper.py:27] 2014-04-16 14:19:12,680 - main - WARNING - Missing text from http://www.peexpress.mobi/news/read/3847/one-day-without-shoes-on-april-29 [in scraper.py:27] 2014-04-16 16:26:47,647 - main - WARNING - Missing text from http://alexnews.co.za/24414/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:26:52,330 - main - WARNING - Missing text from http://citybuzz.co.za/13536/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:27:07,992 - main - WARNING - Missing text from http://fourwaysreview.co.za/172361/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:27:19,389 - main - WARNING - Missing text from http://midrandreporter.co.za/107181/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:27:21,046 - main - WARNING - Missing text from http://nelspruitpost.co.za/155908/bergies-en-primary-se-die-stryd-aan/ [in scraper.py:27] 2014-04-16 16:27:41,821 - main - WARNING - Missing text from http://northeasterntribune.co.za/134239/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:28:04,711 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/roodies/ [in scraper.py:27] 2014-04-16 16:28:07,016 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/soccer/ [in scraper.py:27] 2014-04-16 16:28:17,238 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/florries-5/ [in scraper.py:27] 2014-04-16 16:28:18,069 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/havies/ [in scraper.py:27] 2014-04-16 16:28:20,881 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/die-ruiter/ [in scraper.py:27] 2014-04-16 16:28:37,657 - main - WARNING - Missing text from http://rosebankkillarneygazette.co.za/134572/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:28:40,838 - main - WARNING - Missing text from http://sandtonchronicle.co.za/90139/oscar-trial-five-things-need-know-day-24-2/ [in scraper.py:27] 2014-04-16 17:28:24,968 - main - WARNING - Missing text from http://northcoastcourier.co.za/16820/curro-excels-hockey/ [in scraper.py:27] 2014-04-16 21:28:32,469 - main - WARNING - Missing text from http://www.franco-sa.co.za/news/486 [in scraper.py:27] 2014-04-17 01:54:42,846 - scrapers.naspers.parser - ERROR - Could not download: http://www.kalaharibulletin.mobi/news/read/2994/video-snotkop-wys-gesig [in /var/www/scrapers/scrapers/naspers/parser.py:122]
— Reply to this email directly or view it on GitHubhttps://github.com/Code4SA/various-scrapers/issues/9#issuecomment-40683988 .
Adi Eyal Data Specialist phone: +27 78 014 2469 skype: adieyalcas linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal
We see to have stories again with text that aren't being grabbed properly.
2014-04-22 12:31:01,164 - __main__ - WARNING - Missing text from http://newcastleadvertiser.co.za/21002/easter-sunrise-service-hilldrop/ [in scraper.py:27]
2014-04-22 12:31:47,534 - __main__ - WARNING - Missing text from http://southcoastherald.co.za/35604/hibberdene-couta-classic-2014-2/ [in scraper.py:27]
2014-04-22 12:32:05,724 - __main__ - WARNING - Missing text from http://tembisan.co.za/14894/family-celebrates-tumi/ [in scraper.py:27]
2014-04-22 15:28:07,170 - __main__ - WARNING - Missing text from http://citizen.co.za/164055/allow-people-judge-malema/ [in scraper.py:27]
2014-04-22 16:28:46,316 - __main__ - WARNING - Missing text from http://southcoastherald.co.za/35693/heres-big-easter-winner/ [in scraper.py:27]
Quick report back on the urls above.
The first two don't actually have bodies The third and fifth examples don't have bodies but do have image captions. I have now incorporated the captions into the body The fourth example was fixed.
Looks like a .mobi problem. Could you investigate?