Closed aserlich closed 10 years ago
Very strange - when I test on my side it comes out perfectly. Will delete those articles and re-run them.
On 4 April 2014 09:03, aserlich notifications@github.com wrote:
I am testing the text scraping and it is not functioning properly. I test looking at the Isloezwe stories and we are only grabbing one line of text and the text is not associate with the story. Do we need to change the subtype on Isolezwe? It's the same as other Isolezwe papers?
db.mycollection.find({publication:"Isolezwe",downloaded_at: {$gt: new Date(2014, 3, 1) } })[1]{ "_id" : ObjectId("533ab090a8a0b802e763c226"), "url" : "http://www.iol.co.za/asivumi-ukulunga-isimo-ehostela-lakwamashu-1.1669273", "publication" : "Isolezwe", "published" : ISODate("2014-04-01T09:08:56Z"), "downloaded_at" : ISODate("2014-04-01T14:26:55.399Z"), "text" : "I'm a 21 year old woman looking to meet men and women between the ages of 25 and 27.", "title" : "Asivumi ukulunga isimo ehostela laKwaMashu", "sub_type" : 1, "owner" : "IOL", "summary" : "
Kunezinsolo zokuthi sekubuye iqembu lezinkabi elibulala abantu ebese lihambile ehostela laKwaMashu.
"}> db.mycollection.find({publication:"Isolezwe",downloaded_at: {$gt: new Date(2014, 3, 1) } })[2]{ "_id" : ObjectId("533ab090a8a0b802e8c62c2b"), "url" : "http://www.iol.co.za/abenfp-baphume-behlehla-emzumbe-1.1669264", "publication" : "Isolezwe", "published" : ISODate("2014-04-01T09:03:56Z"), "downloaded_at" : ISODate("2014-04-01T14:26:55.577Z"), "text" : "I'm a 21 year old woman looking to meet men and women between the ages of 25 and 27.", "title" : "AbeNFP baphume behlehla eMzumbe", "sub_type" : 1, "owner" : "IOL", "summary" : "Basinde kukubi abaholi beNFP abebekhankasa esizindeni se-ANC eMzumbe.
"}>```Reply to this email directly or view it on GitHubhttps://github.com/Code4SA/various-scrapers/issues/6 .
Adi Eyal Director Code for South Africa Promoting informed decision-making
phone: +27 78 014 2469 skype: adieyalcas linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal web: http://www.code4sa.org twitter: @soapsudtycoon
For more information on how to participate in the open data community in South Africa, go to: http://www.code4sa.org/#community
Fixed
I am testing the text scraping and it is not functioning properly. I test looking at the Isoelzwe stories and we are only grabbing one line of text and the text is not associate with the story. Do we need to change the subtype on Isolezwe? It's the same as other Isolezwe papers?