Open aserlich opened 10 years ago
Hi Adi,
I looked at todays error logs and this still seems to be a problem. Any understanding of the problem?
2014-04-04 15:30:57,505 - __main__ - WARNING - Missing text from http://citizen.co.za/154875/vavi-ruling-victory-wasp/ [in scraper.py:26]
2014-04-04 15:30:57,507 - __main__ - WARNING - Missing text from http://citizen.co.za/154874/luthiania-takes-lead-sa/ [in scraper.py:26]
2014-04-04 16:30:26,015 - __main__ - WARNING - Missing text from http://citizen.co.za/154904/anc-sms-application-dismissed/ [in scraper.py:26]
2014-04-04 16:30:26,240 - __main__ - WARNING - Missing text from http://citizen.co.za/154893/swimmers-complete-459km-24-days/ [in scraper.py:26]
2014-04-04 16:30:26,798 - __main__ - WARNING - Missing text from http://citizen.co.za/154904/anc-sms-application-dismissed/ [in scraper.py:26]
2014-04-04 17:29:27,820 - __main__ - WARNING - Missing text from http://citizen.co.za/154927/bccsa-dismisses-complaint-702/ [in scraper.py:26]
2014-04-04 17:29:27,951 - __main__ - WARNING - Missing text from http://citizen.co.za/154922/eff-launches-gauteng-election-campaign/ [in scraper.py:26]
2014-04-04 17:29:29,289 - __main__ - WARNING - Missing text from http://citizen.co.za/154916/nehawu-consult-legal-advisors/ [in scraper.py:26]
2014-04-04 17:29:29,283 - __main__ - WARNING - Missing text from http://citizen.co.za/154918/slain-miner-fleeing-shot-commission/ [in scraper.py:26]
2014-04-04 17:29:29,939 - __main__ - WARNING - Missing text from http://citizen.co.za/154915/kzn-traffic-cop-shot-dead/ [in scraper.py:26]
2014-04-04 18:30:26,419 - __main__ - WARNING - Missing text from http://citizen.co.za/154935/court-rules-krejcir/ [in scraper.py:26]
2014-04-04 18:30:26,421 - __main__ - WARNING - Missing text from http://citizen.co.za/154953/hondas-187-kmh-lawnmower/ [in scraper.py:26]
2014-04-04 18:30:26,768 - __main__ - WARNING - Missing text from http://citizen.co.za/154930/vavi-voice-voiceless-sym/ [in scraper.py:26]
2014-04-04 18:30:27,202 - __main__ - WARNING - Missing text from http://citizen.co.za/154929/nkandla-rpeort-political-hands/ [in scraper.py:26]
I've made some changes - will monitor the situation. I have also re-processed the old urls.
Adi
On 5 April 2014 16:03, aserlich notifications@github.com wrote:
Hi Adi,
I looked at todays error logs and this still seems to be a problem. Any understanding of the problem?
2014-04-04 15:30:57,505 - main - WARNING - Missing text from http://citizen.co.za/154875/vavi-ruling-victory-wasp/ [in scraper.py:26] 2014-04-04 15:30:57,507 - main - WARNING - Missing text from http://citizen.co.za/154874/luthiania-takes-lead-sa/ [in scraper.py:26] 2014-04-04 16:30:26,015 - main - WARNING - Missing text from http://citizen.co.za/154904/anc-sms-application-dismissed/ [in scraper.py:26] 2014-04-04 16:30:26,240 - main - WARNING - Missing text from http://citizen.co.za/154893/swimmers-complete-459km-24-days/ [in scraper.py:26] 2014-04-04 16:30:26,798 - main - WARNING - Missing text from http://citizen.co.za/154904/anc-sms-application-dismissed/ [in scraper.py:26] 2014-04-04 17:29:27,820 - main - WARNING - Missing text from http://citizen.co.za/154927/bccsa-dismisses-complaint-702/ [in scraper.py:26] 2014-04-04 17:29:27,951 - main - WARNING - Missing text from http://citizen.co.za/154922/eff-launches-gauteng-election-campaign/ [in scraper.py:26] 2014-04-04 17:29:29,289 - main - WARNING - Missing text from http://citizen.co.za/154916/nehawu-consult-legal-advisors/ [in scraper.py:26] 2014-04-04 17:29:29,283 - main - WARNING - Missing text from http://citizen.co.za/154918/slain-miner-fleeing-shot-commission/ [in scraper.py:26] 2014-04-04 17:29:29,939 - main - WARNING - Missing text from http://citizen.co.za/154915/kzn-traffic-cop-shot-dead/ [in scraper.py:26] 2014-04-04 18:30:26,419 - main - WARNING - Missing text from http://citizen.co.za/154935/court-rules-krejcir/ [in scraper.py:26] 2014-04-04 18:30:26,421 - main - WARNING - Missing text from http://citizen.co.za/154953/hondas-187-kmh-lawnmower/ [in scraper.py:26] 2014-04-04 18:30:26,768 - main - WARNING - Missing text from http://citizen.co.za/154930/vavi-voice-voiceless-sym/ [in scraper.py:26] 2014-04-04 18:30:27,202 - main - WARNING - Missing text from http://citizen.co.za/154929/nkandla-rpeort-political-hands/ [in scraper.py:26]
Reply to this email directly or view it on GitHubhttps://github.com/Code4SA/various-scrapers/issues/7#issuecomment-39639138 .
Adi Eyal Director Code for South Africa Promoting informed decision-making
phone: +27 78 014 2469 skype: adieyalcas linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal web: http://www.code4sa.org twitter: @soapsudtycoon
For more information on how to participate in the open data community in South Africa, go to: http://www.code4sa.org/#community
Ok, thanks for the update! Will also keep my eye out.
Looks like we have this problem coming up again with stories with actual text... Any ideas?
2014-04-30 10:29:33,803 - __main__ - WARNING - Missing text from http://boksburgadvertiser.co.za/195724/annual-mrs-south-africa-cansa-gala-dinner-2/ [in scraper.py:30]
2014-04-30 11:29:55,375 - __main__ - WARNING - Missing text from http://mpumalanganews.co.za/172006/ambulances-handed-spead-service-delivery/ [in scraper.py:30]
2014-04-30 14:30:49,553 - __main__ - WARNING - Missing text from http://southcoastsun.co.za/37427/toti-fc-u9-westville-hutchison-park/ [in scraper.py:30]
2014-04-30 15:06:48,708 - __main__ - WARNING - Missing text from http://www.wstandard.mobi/news/read/4372/scubi-forester-revisited [in scraper.py:30]
2014-04-30 15:31:36,209 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/30/tots-tweens-teens-competition-week-3-9-13-years/ [in scraper.py:30]
2014-04-30 15:31:38,851 - __main__ - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/30/tots-tweens-teens-competition-week-3-4-8-years/ [in scraper.py:30]
2014-04-30 16:46:05,434 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2857/20-years-of-freedom-and-democracy-campaign-support [in scraper.py:30]
2014-04-30 16:46:08,178 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2859/growth-for-spur-school-mountain-bike-league [in scraper.py:30]
2014-04-30 16:48:07,172 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2645/minister-of-sport-storms-out-of-al-jazeera-studio [in scraper.py:30]
2014-04-30 16:48:10,733 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2648/a-hippo-love-story [in scraper.py:30]
2014-04-30 16:50:27,715 - __main__ - WARNING - Missing text from http://www.tametimes.mobi/news/read/2754/the-rand-show-2014-it-s-showtime [in scraper.py:30]
2014-05-01 00:40:47,322 - __main__ - WARNING - Missing text from http://www.vrystaat.mobi/news/read/2042/spotprent-1-mei-2014 [in scraper.py:30]
2014-05-01 01:03:52,820 - root - ERROR - Error accessing url: {u'url': u'http://www.maluti.mobi/news/read/1254/winterwenke-vir-die-tuinier', u'entry': {}, u'scraper': u'naspers_local', u'publication': u'Maluti'} [in /var/www/scrapers/
Some of them don't actually have bodies but there are a few that do. Looking into it.
On 1 May 2014 13:43, aserlich notifications@github.com wrote:
Looks like we have this problem coming up again with stories with actual text... Any ideas?
2014-04-30 10:29:33,803 - main - WARNING - Missing text from http://boksburgadvertiser.co.za/195724/annual-mrs-south-africa-cansa-gala-dinner-2/ [in scraper.py:30] 2014-04-30 11:29:55,375 - main - WARNING - Missing text from http://mpumalanganews.co.za/172006/ambulances-handed-spead-service-delivery/ [in scraper.py:30] 2014-04-30 14:30:49,553 - main - WARNING - Missing text from http://southcoastsun.co.za/37427/toti-fc-u9-westville-hutchison-park/ [in scraper.py:30] 2014-04-30 15:06:48,708 - main - WARNING - Missing text from http://www.wstandard.mobi/news/read/4372/scubi-forester-revisited [in scraper.py:30] 2014-04-30 15:31:36,209 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/30/tots-tweens-teens-competition-week-3-9-13-years/ [in scraper.py:30] 2014-04-30 15:31:38,851 - main - WARNING - Missing text from http://roodepoortrecord.co.za/2014/04/30/tots-tweens-teens-competition-week-3-4-8-years/ [in scraper.py:30] 2014-04-30 16:46:05,434 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2857/20-years-of-freedom-and-democracy-campaign-support [in scraper.py:30] 2014-04-30 16:46:08,178 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2859/growth-for-spur-school-mountain-bike-league [in scraper.py:30] 2014-04-30 16:48:07,172 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2645/minister-of-sport-storms-out-of-al-jazeera-studio [in scraper.py:30] 2014-04-30 16:48:10,733 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2648/a-hippo-love-story [in scraper.py:30] 2014-04-30 16:50:27,715 - main - WARNING - Missing text from http://www.tametimes.mobi/news/read/2754/the-rand-show-2014-it-s-showtime [in scraper.py:30] 2014-05-01 00:40:47,322 - main - WARNING - Missing text from http://www.vrystaat.mobi/news/read/2042/spotprent-1-mei-2014 [in scraper.py:30] 2014-05-01 01:03:52,820 - root - ERROR - Error accessing url: {u'url': u'http://www.maluti.mobi/news/read/1254/winterwenke-vir-die-tuinier', u'entry': {}, u'scraper': u'naspers_local', u'publication': u'Maluti'} [in /var/www/scrapers/
— Reply to this email directly or view it on GitHubhttps://github.com/Code4SA/various-scrapers/issues/7#issuecomment-41901754 .
Adi Eyal Data Specialist phone: +27 78 014 2469 skype: adieyalcas linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal
Looks like we are still getting zero length text text for some Caxton stories. See example below. Can we rerun to capture these and rerun database to check?