fandangOrg / fandango

FAke News discovery and propagation from big Data ANalysis and artificial intelliGence Operations
1 stars 1 forks source link

Keywords field missing in preprocessing out #93

Closed macagari closed 3 years ago

macagari commented 3 years ago

Hi, I was updating the image with the modification we defined weeks ago but the preprocessing is not return the field "keywords"

Output from crawlers:

{'identifier': '5c03fbd2cdd66f1c306416f19895aab5f48d0faf7b2447c4dc0290b35da15542e73acb11d6ff95b9eb063c457a2549ee0d1d7d626c539541ccc54a912d468246', 'authors': ['Alex Hern', 'Kenya Evelyn', '', 'Helen Sullivan', 'Joanna Walters', 'Joan E Greve', 'Julia Carrie Wong', 'Helen Sullivan (now) with \nJulia Carrie Wong ,\nJoan E Greve and \nMartin Belam (earlier)', 'Martin Belam', 'Dominic Rushe', 'Ben Doherty'], 'publish_date_estimated': 'no', 'date_created': '2021-01-08T10:49:31Z', 'date_modified': '2021-01-08T10:49:31Z', 'date_published': '2021-01-07T00:00:00Z', 'description': 'This blog is now closed. You can read our main story on the day’s events below:', 'images': ['https://i.guim.co.uk/img/media/be89cd99e30d1261a9462fad7124eb3a6b608a37/0_228_3500_2100/master/3500.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctbGl2ZS5wbmc&enable=upscale&s=2a41992ec9ff2852dcfe16d5e28c5f61', 'https://pbs.twimg.com/media/ErFhkeIXYAATJlP.jpg', 'https://i.guim.co.uk/img/media/be89cd99e30d1261a9462fad7124eb3a6b608a37/0_228_3500_2100/master/3500.jpg?width=300&quality=85&auto=format&fit=max&s=a7066c1d3a947453508d7ae7dd6b1a36', 'https://i.guim.co.uk/img/media/639412046e039948a31db206ba461365c8156f4c/0_0_5223_3134/master/5223.jpg?width=300&quality=85&auto=format&fit=max&s=2c118f100ffdb215fb36595f4da330fe', 'https://i.guim.co.uk/img/media/a4c5f68d29d4ae696ced51a643e98f08915c1e2d/0_85_5235_3141/master/5235.jpg?width=300&quality=85&auto=format&fit=max&s=1fa9b09d8b5ed0e16ad2910e9ac99e55', 'https://i.guim.co.uk/img/media/69ee6e9752c3497e3e546ee31c16fff15d4f4e18/0_233_3500_2100/master/3500.jpg?width=300&quality=85&auto=format&fit=max&s=9c54aa90a65985d06355f75f638049c1', 'https://phar.gu-web.net/count/pvg.gif', 'https://i.guim.co.uk/img/media/568a9c04232f36554dc6f0089058deacb338eb50/0_20_3000_1800/master/3000.jpg?width=300&quality=85&auto=format&fit=max&s=2b937ce903a9125cf1b37ebbc6a28fde', 'https://sb.scorecardresearch.com/p?c1=2&c2=6035250&cv=2.0&cj=1&cs_ucfr=0&gdpr=0&comscorekw=US+Capitol+breach%2CUS+news%2CWorld+news%2CDonald+Trump%2CUS+elections+2020%2CJoe+Biden%2CUS+politics%2CRepublicans%2CDemocrats%2CUS+Congress%2CUS+Senate%2CHouse+of+Representatives', 'https://pbs.twimg.com/media/ErLka7AWMAAO2QJ.jpg', 'https://i.guim.co.uk/img/media/0298ca6fd42696cd0592ad7332ddebb7ececb9bd/0_163_5267_3162/master/5267.jpg?width=300&quality=85&auto=format&fit=max&s=23d7f5888fc5cf995387c023e1306214', 'https://assets.guim.co.uk/images/badges/463460171768f1a03cb4d6fbc8db8956/us-elections-2020.svg', 'https://phar.gu-web.net/count/pv.gif', 'https://pbs.twimg.com/media/ErEx_zLUcAIQINL.jpg'], 'keywords': ['closed', 'acknowledges', 'main', 'trump', 'days', 'events', 'read', 'happened', 'blog', 'administration'], 'language': 'en', 'source_domain': 'www.theguardian.com', 'summary': 'This blog is now closed.\nYou can read our main story on the day’s events below:', 'text': 'This blog is now closed. You can read our main story on the day’s events below: ', 'texthash': ['6a2cbb7930050aad0eefeeeb2c52565defaedbf7b00ba5c9dae8871221938dcc'], 'title': "Trump acknowledges 'new administration' – as it happened", 'top_image': 'https://i.guim.co.uk/img/media/be89cd99e30d1261a9462fad7124eb3a6b608a37/0_228_3500_2100/master/3500.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctbGl2ZS5wbmc&enable=upscale&s=2a41992ec9ff2852dcfe16d5e28c5f61', 'url': 'https://www.theguardian.com/us-news/live/2021/jan/07/joe-biden-donald-trump-mike-pence-capitol-congress-us-election-coronavirus-live-updates', 'videos': [], 'spider': 'online'}

Output preprocessing: {'message': 'Successful Operation', 'status': 200, 'data': {'identifier': '5c03fbd2cdd66f1c306416f19895aab5f48d0faf7b2447c4dc0290b35da15542e73acb11d6ff95b9eb063c457a2549ee0d1d7d626c539541ccc54a912d468246', 'headline': "Trump acknowledges 'new administration' – as it happened", 'articleBody': 'This blog is now closed. You can read our main story on the day’s events below: ', 'url': 'https://www.theguardian.com/us-news/live/2021/jan/07/joe-biden-donald-trump-mike-pence-capitol-congress-us-election-coronavirus-live-updates', 'language': 'en', 'images': ['https://i.guim.co.uk/img/media/a4c5f68d29d4ae696ced51a643e98f08915c1e2d/0_85_5235_3141/master/5235.jpg?width=300&quality=85&auto=format&fit=max&s=1fa9b09d8b5ed0e16ad2910e9ac99e55', 'https://i.guim.co.uk/img/media/be89cd99e30d1261a9462fad7124eb3a6b608a37/0_228_3500_2100/master/3500.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctbGl2ZS5wbmc&enable=upscale&s=2a41992ec9ff2852dcfe16d5e28c5f61', 'https://i.guim.co.uk/img/media/0298ca6fd42696cd0592ad7332ddebb7ececb9bd/0_163_5267_3162/master/5267.jpg?width=300&quality=85&auto=format&fit=max&s=23d7f5888fc5cf995387c023e1306214', 'https://i.guim.co.uk/img/media/be89cd99e30d1261a9462fad7124eb3a6b608a37/0_228_3500_2100/master/3500.jpg?width=300&quality=85&auto=format&fit=max&s=a7066c1d3a947453508d7ae7dd6b1a36', 'https://pbs.twimg.com/media/ErFhkeIXYAATJlP.jpg', 'https://i.guim.co.uk/img/media/69ee6e9752c3497e3e546ee31c16fff15d4f4e18/0_233_3500_2100/master/3500.jpg?width=300&quality=85&auto=format&fit=max&s=9c54aa90a65985d06355f75f638049c1', 'https://i.guim.co.uk/img/media/568a9c04232f36554dc6f0089058deacb338eb50/0_20_3000_1800/master/3000.jpg?width=300&quality=85&auto=format&fit=max&s=2b937ce903a9125cf1b37ebbc6a28fde', 'https://pbs.twimg.com/media/ErLka7AWMAAO2QJ.jpg', 'https://i.guim.co.uk/img/media/639412046e039948a31db206ba461365c8156f4c/0_0_5223_3134/master/5223.jpg?width=300&quality=85&auto=format&fit=max&s=2c118f100ffdb215fb36595f4da330fe', 'https://pbs.twimg.com/media/ErEx_zLUcAIQINL.jpg'], 'videos': [], 'dateCreated': '2021-01-08T10:49:31Z', 'dateModified': '2021-01-08T10:49:31Z', 'datePublished': '2021-01-07T00:00:00Z', 'publishDateEstimated': 'no', 'authors': ['Alex Hern', 'Helen Sullivan', 'Joanna Walters', 'Joan E Greve', 'Julia Carrie Wong', 'Martin Belam', 'Ben Doherty'], 'publisher': ['Guardian'], 'sourceDomain': 'www.theguardian.com', 'country': 'UK', 'nationality': 'N/A', 'calculatedRating': -99, 'calculatedRatingDetail': ''}}

pstalidis commented 3 years ago

This is tracked in #90