Code4SA / various-scrapers

Apache License 2.0
2 stars 2 forks source link

Missing News Stories | April 12 #9

Open aserlich opened 10 years ago

aserlich commented 10 years ago

Looks like a .mobi problem. Could you investigate?

2014-04-12 01:26:28,921 - __main__ - WARNING - Missing text from http://springsadvertiser.co.za/89335/da-gives-solar-units/ [in scraper.py:27]
    publish_date = date_parser.parse(divs[3].text)
2014-04-12 05:26:32,883 - __main__ - WARNING - Missing text from http://www.kougaexpress.co.za/137421/news-details/ [in scraper.py:27]
2014-04-12 05:26:32,939 - __main__ - WARNING - Missing text from http://www.kougaexpress.co.za/137687/news-details/ [in scraper.py:27]
    return json.loads(self.text, **kwargs)
    return json.loads(self.text, **kwargs)
2014-04-12 09:21:20,386 - __main__ - WARNING - Missing text from http://24.com.feedsportal.com/c/33816/f/607927/s/39421f5c/sc/40/l/0L0Schannel240Bco0Bza0CTV0CNews0COscar0Etrial0ETV0Echannel0Elifts0EDStv0E20A140A411/story01.htm [in scraper.py:27]
2014-04-12 09:21:20,414 - __main__ - WARNING - Missing text from http://24.com.feedsportal.com/c/33816/f/607927/s/39421f5c/sc/40/l/0L0Schannel240Bco0Bza0CTV0CNews0COscar0Etrial0ETV0Echannel0Elifts0EDStv0E20A140A411/story01.htm [in scraper.py:27]
2014-04-12 09:21:47,497 - __main__ - WARNING - Missing text from http://ballitofever.mobi/news/read/556/woman-shot-in-taxi-rank-violence [in scraper.py:27]
2014-04-12 09:21:47,663 - __main__ - WARNING - Missing text from http://ballitofever.mobi/news/read/556/woman-shot-in-taxi-rank-violence [in scraper.py:27]
2014-04-12 09:24:27,446 - __main__ - WARNING - Missing text from http://www.carletonville.mobi/news/read/3238/scubi-forester-revisited [in scraper.py:27]
2014-04-12 09:25:36,336 - __main__ - WARNING - Missing text from http://www.coastalweekly.mobi/news/read/751/woman-shot-in-taxi-rank-violence [in scraper.py:27]
2014-04-12 09:31:18,163 - __main__ - WARNING - Missing text from http://www.eastlondonexpress.mobi/news/read/762/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 09:31:26,976 - __main__ - WARNING - Missing text from http://www.eastlondonexpress.mobi/news/read/760/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 09:32:14,568 - __main__ - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2481/scubi-forester-revisited [in scraper.py:27]
2014-04-12 09:32:15,365 - __main__ - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2481/scubi-forester-revisited [in scraper.py:27]
2014-04-12 09:32:23,068 - __main__ - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2511/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 09:32:28,667 - __main__ - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2509/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 09:32:28,753 - __main__ - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2509/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 09:34:34,934 - __main__ - WARNING - Missing text from http://www.hermanustimes.mobi/news/read/2842/bejaarde-beseer-in-motorongeluk [in scraper.py:27]
2014-04-12 09:35:21,545 - __main__ - WARNING - Missing text from http://www.isoexpress.mobi/news/read/236/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 09:35:25,671 - __main__ - WARNING - Missing text from http://www.isoexpress.mobi/news/read/235/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 09:35:26,984 - __main__ - WARNING - Missing text from http://www.isoexpress.mobi/news/read/235/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 09:36:32,571 - __main__ - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1543/having-fun-at-creche [in scraper.py:27]
2014-04-12 09:36:37,729 - __main__ - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1538/neem-deel-aan-wedstryd [in scraper.py:27]
2014-04-12 09:36:38,111 - __main__ - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1538/neem-deel-aan-wedstryd [in scraper.py:27]
2014-04-12 09:36:38,283 - __main__ - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1537/stel-ondersoek-na-lekkende-pyp-in [in scraper.py:27]
2014-04-12 09:37:07,851 - __main__ - WARNING - Missing text from http://www.pmbfever.mobi/news/read/1161/giving-hope-to-patients [in scraper.py:27]
2014-04-12 09:37:18,277 - __main__ - WARNING - Missing text from http://www.pmbfever.mobi/news/read/1146/twc-tennis-achievers [in scraper.py:27]
2014-04-12 09:37:37,989 - __main__ - WARNING - Missing text from http://www.mthathaexpress.mobi/news/read/344/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 09:37:40,211 - __main__ - WARNING - Missing text from http://www.mthathaexpress.mobi/news/read/343/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 09:38:05,475 - __main__ - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/355/cycling-for-a-worthy-cause [in scraper.py:27]
2014-04-12 09:38:06,538 - __main__ - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/356/football-academy-for-mogwase [in scraper.py:27]
2014-04-12 09:38:11,245 - __main__ - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/343/strydom-grooming-young-boxer [in scraper.py:27]
2014-04-12 09:38:12,172 - __main__ - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/342/football-club-transforming-maboloka [in scraper.py:27]
2014-04-12 09:39:05,918 - __main__ - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6290/bakkie-tol-op-n1 [in scraper.py:27]
2014-04-12 09:39:13,487 - __main__ - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6292/talle-gesteelde-goedere-gevind [in scraper.py:27]
2014-04-12 09:39:17,208 - __main__ - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6297/riana-nel-by-simonsvlei [in scraper.py:27]
2014-04-12 09:39:17,378 - __main__ - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6296/ub40-on-stage [in scraper.py:27]
2014-04-12 09:39:21,204 - __main__ - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6299/chris-chameleon-by-ou-meul [in scraper.py:27]
2014-04-12 09:39:24,636 - __main__ - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6301/gim-se-hokkie-leerders-skitter [in scraper.py:27]
2014-04-12 09:40:24,041 - __main__ - WARNING - Missing text from http://www.parysgazette.mobi/news/read/2349/scubi-forester-revisited [in scraper.py:27]
2014-04-12 09:41:12,646 - __main__ - WARNING - Missing text from http://www.peoplespost.mobi/news/read/8143/teen-shot-in-manenberg [in scraper.py:27]
2014-04-12 09:41:23,731 - __main__ - WARNING - Missing text from http://www.peoplespost.mobi/news/read/8133/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27]
2014-04-12 09:41:24,810 - __main__ - WARNING - Missing text from http://www.peoplespost.mobi/news/read/8133/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27]
2014-04-12 09:42:24,551 - __main__ - WARNING - Missing text from http://www.peexpress.mobi/news/read/3801/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 09:42:30,593 - __main__ - WARNING - Missing text from http://www.peexpress.mobi/news/read/3796/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 09:43:37,791 - __main__ - WARNING - Missing text from http://www.potchherald.mobi/news/read/4641/scubi-forester-revisited [in scraper.py:27]
2014-04-12 09:45:21,890 - __main__ - WARNING - Missing text from http://www.sedibengster.mobi/news/read/2294/scubi-forester-revisited [in scraper.py:27]
2014-04-12 09:46:21,874 - __main__ - WARNING - Missing text from http://stangerweekly.mobi/news/read/1021/woman-shot-in-taxi-rank-violence [in scraper.py:27]
2014-04-12 09:46:57,297 - __main__ - WARNING - Missing text from http://www.sunshinecoastexpress.mobi/news/read/483/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 09:47:01,596 - __main__ - WARNING - Missing text from http://www.sunshinecoastexpress.mobi/news/read/482/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 09:47:11,160 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/3793/demi-is-kwaai [in scraper.py:27]
2014-04-12 09:47:11,161 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/3792/onse-charlize-en-berk-is-in-die-kaap [in scraper.py:27]
2014-04-12 09:47:11,301 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/3792/onse-charlize-en-berk-is-in-die-kaap [in scraper.py:27]
2014-04-12 09:47:11,307 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/3793/demi-is-kwaai [in scraper.py:27]
2014-04-12 09:47:13,964 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/3794/jokes-10-apr-2014 [in scraper.py:27]
2014-04-12 09:47:13,993 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/3794/jokes-10-apr-2014 [in scraper.py:27]
2014-04-12 09:51:26,606 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2558/my-man-wil-n-threesome-he [in scraper.py:27]
2014-04-12 09:51:26,799 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2558/my-man-wil-n-threesome-he [in scraper.py:27]
2014-04-12 10:02:55,082 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27]
2014-04-12 10:02:55,186 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27]
2014-04-12 10:02:55,209 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27]
2014-04-12 10:02:55,731 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27]
2014-04-12 10:05:44,086 - __main__ - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4612/welgemoedsafe-has-new-ops-room-and-ops-manager [in scraper.py:27]
2014-04-12 10:05:44,387 - __main__ - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4612/welgemoedsafe-has-new-ops-room-and-ops-manager [in scraper.py:27]
2014-04-12 10:05:44,952 - __main__ - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4614/measles-polio-vaccine-stock-shortfall [in scraper.py:27]
2014-04-12 10:05:45,270 - __main__ - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4613/centres-for-city-s-homeless-making-inroads [in scraper.py:27]
2014-04-12 10:05:46,944 - __main__ - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4613/centres-for-city-s-homeless-making-inroads [in scraper.py:27]
2014-04-12 10:06:01,897 - __main__ - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4586/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27]
2014-04-12 10:06:02,354 - __main__ - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4586/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27]
2014-04-12 10:06:02,706 - __main__ - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4586/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27]
2014-04-12 10:06:29,662 - __main__ - WARNING - Missing text from http://www.udnews.mobi/news/read/1838/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 10:06:31,416 - __main__ - WARNING - Missing text from http://www.udnews.mobi/news/read/1838/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 10:06:33,396 - __main__ - WARNING - Missing text from http://www.udnews.mobi/news/read/1829/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 10:06:34,549 - __main__ - WARNING - Missing text from http://www.udnews.mobi/news/read/1829/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 10:07:15,598 - __main__ - WARNING - Missing text from http://www.uvoexpress.mobi/news/read/209/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 10:07:16,577 - __main__ - WARNING - Missing text from http://www.uvoexpress.mobi/news/read/209/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 10:07:17,982 - __main__ - WARNING - Missing text from http://www.uvoexpress.mobi/news/read/208/yes-dewani-s-only-word-to-court [in scraper.py:27]
2014-04-12 10:08:23,496 - __main__ - WARNING - Missing text from http://www.vaalweekblad.mobi/news/read/9045/scubi-forester-revisited [in scraper.py:27]
2014-04-12 10:09:33,400 - __main__ - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27]
2014-04-12 10:09:33,399 - __main__ - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27]
2014-04-12 10:09:35,132 - __main__ - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27]
2014-04-12 10:09:35,136 - __main__ - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27]
2014-04-12 10:11:45,531 - __main__ - WARNING - Missing text from http://udnews.mobi/news/read/1838/i-try-not-to-lie-oscar [in scraper.py:27]
2014-04-12 10:11:47,228 - __main__ - WARNING - Missing text from http://udnews.mobi/news/read/1829/yes-dewani-s-only-word-to-court [in scraper.py:27]
    return json.loads(self.text, **kwargs)
2014-04-12 19:27:33,714 - __main__ - WARNING - Missing text from http://ladysmithgazette.co.za/17439/n3tc-arthur-cresswell-memorial-marathon/ [in scraper.py:27]
2014-04-12 21:27:08,702 - __main__ - WARNING - Missing text from http://mpumalanganews.co.za/141709/steval-pumas-still-tops-log/ [in scraper.py:27]
2014-04-12 21:27:11,246 - __main__ - WARNING - Missing text from http://lowvelder.co.za/157569/steval-pumas-still-tops-log/ [in scraper.py:27]
2014-04-12 22:26:52,697 - __main__ - WARNING - Missing text from http://corridorgazette.co.za/125205/steval-pumas-still-tops-log/ [in scraper.py:27]
2014-04-12 22:27:00,731 - __main__ - WARNING - Missing text from http://nelspruitpost.co.za/131021/steval-pumas-still-tops-log/ [in scraper.py:27]
2014-04-12 22:27:07,832 - __main__ - WARNING - Missing text from http://whiteriverpost.co.za/115981/steval-pumas-still-tops-log/ [in scraper.py:27]
2014-04-12 23:28:24,726 - __main__ - WARNING - Missing text from http://www.news24.com/World/News/3-charged-over-Hollande-affair-actress-photo-20140412 [in scraper.py:27]
2014-04-13 01:27:03,594 - __main__ - WARNING - Missing text from http://springsadvertiser.co.za/89337/hulle-speel-juksei-by-sas/ [in scraper.py:27]
adieyal commented 10 years ago

I noticed these - looks like there is some minor variations in the html produced for these sites. I've updated the scrapers, will go back and re-download those articles.

Adi

On 13 April 2014 18:04, aserlich notifications@github.com wrote:

Looks like a .mobi problem. Could you investigate?

2014-04-12 01:26:28,921 - main - WARNING - Missing text from http://springsadvertiser.co.za/89335/da-gives-solar-units/ [in scraper.py:27] publish_date = date_parser.parse(divs[3].text) 2014-04-12 05:26:32,883 - main - WARNING - Missing text from http://www.kougaexpress.co.za/137421/news-details/ [in scraper.py:27] 2014-04-12 05:26:32,939 - main - WARNING - Missing text from http://www.kougaexpress.co.za/137687/news-details/ [in scraper.py:27] return json.loads(self.text, _kwargs) return json.loads(self.text, _kwargs) 2014-04-12 09:21:20,386 - main - WARNING - Missing text from http://24.com.feedsportal.com/c/33816/f/607927/s/39421f5c/sc/40/l/0L0Schannel240Bco0Bza0CTV0CNews0COscar0Etrial0ETV0Echannel0Elifts0EDStv0E20A140A411/story01.htm [in scraper.py:27] 2014-04-12 09:21:20,414 - main - WARNING - Missing text from http://24.com.feedsportal.com/c/33816/f/607927/s/39421f5c/sc/40/l/0L0Schannel240Bco0Bza0CTV0CNews0COscar0Etrial0ETV0Echannel0Elifts0EDStv0E20A140A411/story01.htm [in scraper.py:27] 2014-04-12 09:21:47,497 - main - WARNING - Missing text from http://ballitofever.mobi/news/read/556/woman-shot-in-taxi-rank-violence [in scraper.py:27] 2014-04-12 09:21:47,663 - main - WARNING - Missing text from http://ballitofever.mobi/news/read/556/woman-shot-in-taxi-rank-violence [in scraper.py:27] 2014-04-12 09:24:27,446 - main - WARNING - Missing text from http://www.carletonville.mobi/news/read/3238/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:25:36,336 - main - WARNING - Missing text from http://www.coastalweekly.mobi/news/read/751/woman-shot-in-taxi-rank-violence [in scraper.py:27] 2014-04-12 09:31:18,163 - main - WARNING - Missing text from http://www.eastlondonexpress.mobi/news/read/762/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:31:26,976 - main - WARNING - Missing text from http://www.eastlondonexpress.mobi/news/read/760/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:32:14,568 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2481/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:32:15,365 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2481/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:32:23,068 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2511/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:32:28,667 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2509/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:32:28,753 - main - WARNING - Missing text from http://www.edenexpress.mobi/news/read/2509/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:34:34,934 - main - WARNING - Missing text from http://www.hermanustimes.mobi/news/read/2842/bejaarde-beseer-in-motorongeluk [in scraper.py:27] 2014-04-12 09:35:21,545 - main - WARNING - Missing text from http://www.isoexpress.mobi/news/read/236/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:35:25,671 - main - WARNING - Missing text from http://www.isoexpress.mobi/news/read/235/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:35:26,984 - main - WARNING - Missing text from http://www.isoexpress.mobi/news/read/235/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:36:32,571 - main - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1543/having-fun-at-creche [in scraper.py:27] 2014-04-12 09:36:37,729 - main - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1538/neem-deel-aan-wedstryd [in scraper.py:27] 2014-04-12 09:36:38,111 - main - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1538/neem-deel-aan-wedstryd [in scraper.py:27] 2014-04-12 09:36:38,283 - main - WARNING - Missing text from http://www.kroonnuus.mobi/news/read/1537/stel-ondersoek-na-lekkende-pyp-in [in scraper.py:27] 2014-04-12 09:37:07,851 - main - WARNING - Missing text from http://www.pmbfever.mobi/news/read/1161/giving-hope-to-patients [in scraper.py:27] 2014-04-12 09:37:18,277 - main - WARNING - Missing text from http://www.pmbfever.mobi/news/read/1146/twc-tennis-achievers [in scraper.py:27] 2014-04-12 09:37:37,989 - main - WARNING - Missing text from http://www.mthathaexpress.mobi/news/read/344/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:37:40,211 - main - WARNING - Missing text from http://www.mthathaexpress.mobi/news/read/343/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:38:05,475 - main - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/355/cycling-for-a-worthy-cause [in scraper.py:27] 2014-04-12 09:38:06,538 - main - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/356/football-academy-for-mogwase [in scraper.py:27] 2014-04-12 09:38:11,245 - main - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/343/strydom-grooming-young-boxer [in scraper.py:27] 2014-04-12 09:38:12,172 - main - WARNING - Missing text from http://www.lesedingnews.mobi/news/read/342/football-club-transforming-maboloka [in scraper.py:27] 2014-04-12 09:39:05,918 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6290/bakkie-tol-op-n1 [in scraper.py:27] 2014-04-12 09:39:13,487 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6292/talle-gesteelde-goedere-gevind [in scraper.py:27] 2014-04-12 09:39:17,208 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6297/riana-nel-by-simonsvlei [in scraper.py:27] 2014-04-12 09:39:17,378 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6296/ub40-on-stage [in scraper.py:27] 2014-04-12 09:39:21,204 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6299/chris-chameleon-by-ou-meul [in scraper.py:27] 2014-04-12 09:39:24,636 - main - WARNING - Missing text from http://www.paarlpost.mobi/news/read/6301/gim-se-hokkie-leerders-skitter [in scraper.py:27] 2014-04-12 09:40:24,041 - main - WARNING - Missing text from http://www.parysgazette.mobi/news/read/2349/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:41:12,646 - main - WARNING - Missing text from http://www.peoplespost.mobi/news/read/8143/teen-shot-in-manenberg [in scraper.py:27] 2014-04-12 09:41:23,731 - main - WARNING - Missing text from http://www.peoplespost.mobi/news/read/8133/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 09:41:24,810 - main - WARNING - Missing text from http://www.peoplespost.mobi/news/read/8133/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 09:42:24,551 - main - WARNING - Missing text from http://www.peexpress.mobi/news/read/3801/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:42:30,593 - main - WARNING - Missing text from http://www.peexpress.mobi/news/read/3796/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:43:37,791 - main - WARNING - Missing text from http://www.potchherald.mobi/news/read/4641/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:45:21,890 - main - WARNING - Missing text from http://www.sedibengster.mobi/news/read/2294/scubi-forester-revisited [in scraper.py:27] 2014-04-12 09:46:21,874 - main - WARNING - Missing text from http://stangerweekly.mobi/news/read/1021/woman-shot-in-taxi-rank-violence [in scraper.py:27] 2014-04-12 09:46:57,297 - main - WARNING - Missing text from http://www.sunshinecoastexpress.mobi/news/read/483/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 09:47:01,596 - main - WARNING - Missing text from http://www.sunshinecoastexpress.mobi/news/read/482/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 09:47:11,160 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3793/demi-is-kwaai [in scraper.py:27] 2014-04-12 09:47:11,161 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3792/onse-charlize-en-berk-is-in-die-kaap [in scraper.py:27] 2014-04-12 09:47:11,301 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3792/onse-charlize-en-berk-is-in-die-kaap [in scraper.py:27] 2014-04-12 09:47:11,307 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3793/demi-is-kwaai [in scraper.py:27] 2014-04-12 09:47:13,964 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3794/jokes-10-apr-2014 [in scraper.py:27] 2014-04-12 09:47:13,993 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/3794/jokes-10-apr-2014 [in scraper.py:27] 2014-04-12 09:51:26,606 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2558/my-man-wil-n-threesome-he [in scraper.py:27] 2014-04-12 09:51:26,799 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2558/my-man-wil-n-threesome-he [in scraper.py:27] 2014-04-12 10:02:55,082 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27] 2014-04-12 10:02:55,186 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27] 2014-04-12 10:02:55,209 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27] 2014-04-12 10:02:55,731 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2559/my-man-se-my-moemfie-smaak-bitter [in scraper.py:27] 2014-04-12 10:05:44,086 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4612/welgemoedsafe-has-new-ops-room-and-ops-manager [in scraper.py:27] 2014-04-12 10:05:44,387 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4612/welgemoedsafe-has-new-ops-room-and-ops-manager [in scraper.py:27] 2014-04-12 10:05:44,952 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4614/measles-polio-vaccine-stock-shortfall [in scraper.py:27] 2014-04-12 10:05:45,270 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4613/centres-for-city-s-homeless-making-inroads [in scraper.py:27] 2014-04-12 10:05:46,944 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4613/centres-for-city-s-homeless-making-inroads [in scraper.py:27] 2014-04-12 10:06:01,897 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4586/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 10:06:02,354 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4586/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 10:06:02,706 - main - WARNING - Missing text from http://www.tygerburger.mobi/news/read/4586/casper-de-vries-paintings-no-laughing-matter [in scraper.py:27] 2014-04-12 10:06:29,662 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1838/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:06:31,416 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1838/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:06:33,396 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1829/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 10:06:34,549 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1829/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 10:07:15,598 - main - WARNING - Missing text from http://www.uvoexpress.mobi/news/read/209/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:07:16,577 - main - WARNING - Missing text from http://www.uvoexpress.mobi/news/read/209/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:07:17,982 - main - WARNING - Missing text from http://www.uvoexpress.mobi/news/read/208/yes-dewani-s-only-word-to-court [in scraper.py:27] 2014-04-12 10:08:23,496 - main - WARNING - Missing text from http://www.vaalweekblad.mobi/news/read/9045/scubi-forester-revisited [in scraper.py:27] 2014-04-12 10:09:33,400 - main - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27] 2014-04-12 10:09:33,399 - main - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27] 2014-04-12 10:09:35,132 - main - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27] 2014-04-12 10:09:35,136 - main - WARNING - Missing text from http://www.vrystaat.mobi/news/read/1937/spog-met-nuwe-hemde [in scraper.py:27] 2014-04-12 10:11:45,531 - main - WARNING - Missing text from http://udnews.mobi/news/read/1838/i-try-not-to-lie-oscar [in scraper.py:27] 2014-04-12 10:11:47,228 - main - WARNING - Missing text from http://udnews.mobi/news/read/1829/yes-dewani-s-only-word-to-court [in scraper.py:27] return json.loads(self.text, kwargs) 2014-04-12 19:27:33,714 - main - WARNING - Missing text from http://ladysmithgazette.co.za/17439/n3tc-arthur-cresswell-memorial-marathon/ [in scraper.py:27] 2014-04-12 21:27:08,702 - main - WARNING - Missing text from http://mpumalanganews.co.za/141709/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 21:27:11,246 - main - WARNING - Missing text from http://lowvelder.co.za/157569/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 22:26:52,697 - main - WARNING - Missing text from http://corridorgazette.co.za/125205/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 22:27:00,731 - main - WARNING - Missing text from http://nelspruitpost.co.za/131021/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 22:27:07,832 - main - WARNING - Missing text from http://whiteriverpost.co.za/115981/steval-pumas-still-tops-log/ [in scraper.py:27] 2014-04-12 23:28:24,726 - main - WARNING - Missing text from http://www.news24.com/World/News/3-charged-over-Hollande-affair-actress-photo-20140412 [in scraper.py:27] 2014-04-13 01:27:03,594 - main** - WARNING - Missing text from http://springsadvertiser.co.za/89337/hulle-speel-juksei-by-sas/ [in scraper.py:27]

Reply to this email directly or view it on GitHubhttps://github.com/Code4SA/various-scrapers/issues/9 .

Adi Eyal Director Code for South Africa Promoting informed decision-making

phone: +27 78 014 2469 skype: adieyalcas linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal web: http://www.code4sa.org twitter: @soapsudtycoon

For more information on how to participate in the open data community in South Africa, go to: http://www.code4sa.org/#community

aserlich commented 10 years ago

Is this still a problem? Seeing more of these coming through the error logs

2014-04-16 12:32:15,093 - __main__ - WARNING - Missing text from http://southcoastherald.co.za/35109/hibberdene-crash-injures-seven/ [in scraper.py:27]
2014-04-16 13:26:52,694 - __main__ - WARNING - Missing text from http://benonicitytimes.co.za/173425/meet-the-missing-snake/ [in scraper.py:27]
2014-04-16 13:48:49,115 - __main__ - WARNING - Missing text from http://www.udnews.mobi/news/read/1855/one-day-without-shoes-on-april-29 [in scraper.py:27]
2014-04-16 14:09:09,729 - __main__ - WARNING - Missing text from http://udnews.mobi/news/read/1855/one-day-without-shoes-on-april-29 [in scraper.py:27]
2014-04-16 14:19:12,680 - __main__ - WARNING - Missing text from http://www.peexpress.mobi/news/read/3847/one-day-without-shoes-on-april-29 [in scraper.py:27]
2014-04-16 16:26:47,647 - __main__ - WARNING - Missing text from http://alexnews.co.za/24414/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:26:52,330 - __main__ - WARNING - Missing text from http://citybuzz.co.za/13536/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:27:07,992 - __main__ - WARNING - Missing text from http://fourwaysreview.co.za/172361/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:27:19,389 - __main__ - WARNING - Missing text from http://midrandreporter.co.za/107181/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:27:21,046 - __main__ - WARNING - Missing text from http://nelspruitpost.co.za/155908/bergies-en-primary-se-die-stryd-aan/ [in scraper.py:27]
2014-04-16 16:27:41,821 - __main__ - WARNING - Missing text from http://northeasterntribune.co.za/134239/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:28:04,711 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/roodies/ [in scraper.py:27]
2014-04-16 16:28:07,016 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/soccer/ [in scraper.py:27]
2014-04-16 16:28:17,238 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/florries-5/ [in scraper.py:27]
2014-04-16 16:28:18,069 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/havies/ [in scraper.py:27]
2014-04-16 16:28:20,881 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/die-ruiter/ [in scraper.py:27]
2014-04-16 16:28:37,657 - __main__ - WARNING - Missing text from http://rosebankkillarneygazette.co.za/134572/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27]
2014-04-16 16:28:40,838 - __main__ - WARNING - Missing text from http://sandtonchronicle.co.za/90139/oscar-trial-five-things-need-know-day-24-2/ [in scraper.py:27]
2014-04-16 17:28:24,968 - __main__ - WARNING - Missing text from http://northcoastcourier.co.za/16820/curro-excels-hockey/ [in scraper.py:27]
2014-04-16 21:28:32,469 - __main__ - WARNING - Missing text from http://www.franco-sa.co.za/news/486 [in scraper.py:27]
2014-04-17 01:54:42,846 - scrapers.naspers.parser - ERROR - Could not download: http://www.kalaharibulletin.mobi/news/read/2994/video-snotkop-wys-gesig [in /var/www/scrapers/scrapers/naspers/parser.py:122]
adieyal commented 10 years ago

I'm working on it. There seem to be two issues. Firstly, stories with only an image caption. That is now fixed. Secondly, stories which are just bullet points. I'm working on that today.

Adi

On 17 April 2014 08:05, aserlich notifications@github.com wrote:

Is this still a problem? Seeing more of these coming through the error logs

2014-04-16 12:32:15,093 - main - WARNING - Missing text from http://southcoastherald.co.za/35109/hibberdene-crash-injures-seven/ [in scraper.py:27] 2014-04-16 13:26:52,694 - main - WARNING - Missing text from http://benonicitytimes.co.za/173425/meet-the-missing-snake/ [in scraper.py:27] 2014-04-16 13:48:49,115 - main - WARNING - Missing text from http://www.udnews.mobi/news/read/1855/one-day-without-shoes-on-april-29 [in scraper.py:27] 2014-04-16 14:09:09,729 - main - WARNING - Missing text from http://udnews.mobi/news/read/1855/one-day-without-shoes-on-april-29 [in scraper.py:27] 2014-04-16 14:19:12,680 - main - WARNING - Missing text from http://www.peexpress.mobi/news/read/3847/one-day-without-shoes-on-april-29 [in scraper.py:27] 2014-04-16 16:26:47,647 - main - WARNING - Missing text from http://alexnews.co.za/24414/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:26:52,330 - main - WARNING - Missing text from http://citybuzz.co.za/13536/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:27:07,992 - main - WARNING - Missing text from http://fourwaysreview.co.za/172361/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:27:19,389 - main - WARNING - Missing text from http://midrandreporter.co.za/107181/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:27:21,046 - main - WARNING - Missing text from http://nelspruitpost.co.za/155908/bergies-en-primary-se-die-stryd-aan/ [in scraper.py:27] 2014-04-16 16:27:41,821 - main - WARNING - Missing text from http://northeasterntribune.co.za/134239/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:28:04,711 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/roodies/ [in scraper.py:27] 2014-04-16 16:28:07,016 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/soccer/ [in scraper.py:27] 2014-04-16 16:28:17,238 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/florries-5/ [in scraper.py:27] 2014-04-16 16:28:18,069 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/havies/ [in scraper.py:27] 2014-04-16 16:28:20,881 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/16/die-ruiter/ [in scraper.py:27] 2014-04-16 16:28:37,657 - main - WARNING - Missing text from http://rosebankkillarneygazette.co.za/134572/oscar-trial-five-things-need-know-day-24/ [in scraper.py:27] 2014-04-16 16:28:40,838 - main - WARNING - Missing text from http://sandtonchronicle.co.za/90139/oscar-trial-five-things-need-know-day-24-2/ [in scraper.py:27] 2014-04-16 17:28:24,968 - main - WARNING - Missing text from http://northcoastcourier.co.za/16820/curro-excels-hockey/ [in scraper.py:27] 2014-04-16 21:28:32,469 - main - WARNING - Missing text from http://www.franco-sa.co.za/news/486 [in scraper.py:27] 2014-04-17 01:54:42,846 - scrapers.naspers.parser - ERROR - Could not download: http://www.kalaharibulletin.mobi/news/read/2994/video-snotkop-wys-gesig [in /var/www/scrapers/scrapers/naspers/parser.py:122]

— Reply to this email directly or view it on GitHubhttps://github.com/Code4SA/various-scrapers/issues/9#issuecomment-40683988 .

Adi Eyal Data Specialist phone: +27 78 014 2469 skype: adieyalcas linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal

aserlich commented 10 years ago

We see to have stories again with text that aren't being grabbed properly.

2014-04-22 12:31:01,164 - __main__ - WARNING - Missing text from http://newcastleadvertiser.co.za/21002/easter-sunrise-service-hilldrop/ [in scraper.py:27]
2014-04-22 12:31:47,534 - __main__ - WARNING - Missing text from http://southcoastherald.co.za/35604/hibberdene-couta-classic-2014-2/ [in scraper.py:27]
2014-04-22 12:32:05,724 - __main__ - WARNING - Missing text from http://tembisan.co.za/14894/family-celebrates-tumi/ [in scraper.py:27]
2014-04-22 15:28:07,170 - __main__ - WARNING - Missing text from http://citizen.co.za/164055/allow-people-judge-malema/ [in scraper.py:27]
2014-04-22 16:28:46,316 - __main__ - WARNING - Missing text from http://southcoastherald.co.za/35693/heres-big-easter-winner/ [in scraper.py:27]
adieyal commented 10 years ago

Quick report back on the urls above.

The first two don't actually have bodies The third and fifth examples don't have bodies but do have image captions. I have now incorporated the captions into the body The fourth example was fixed.