DrSkippy / Gnacs

Gnip normalized social data activities json to csv parser.
DrSkippy.github.io/Gnacs
Other
5 stars 11 forks source link

Gnacs print csv file from json #33

Open cris-cola opened 10 years ago

cris-cola commented 10 years ago

Hi, maybe I am not sure of how to use your code, I want to get a simple csv file out of my .json data got from twitter, but apparently I am doing something wrong.

I am using this command to create a csv file:

gnacs.py -c json/tweets.json > output.xls

but instead of the columns inside my .json file what I get instead is:

GNIPREMOVE-Unidentified,2014-06-06T16:22:46.000Z,Unidentified meta message

... etcetera...

This is strange considering that I have tried the same commands with your sample data/*.json files and it was working.

Why is that?

Thank you

DrSkippy commented 10 years ago

Can you provide a short sample of the input from tweets.json? Thanks.

cris-cola commented 10 years ago

yes, this is it:

"tweets.json" ---> http://codeshare.io/N3flN

inputing tweets.json I receive:

GNIPREMOVE-Unidentified,2014-06-06T16:22:46.000Z,Unidentified meta message

cris-cola commented 10 years ago

Otherwise, if I input a second .json file, tweets2.json, containing:

"tweets2.json" -> http://codeshare.io/WfJ2i

etc...

I receive as output:

GNIPREMOVE-Unidentified,2014-06-06T16:25:45.000Z,Unidentified meta message GNIPREMOVE-Unidentified,2014-06-06T16:25:45.000Z,Unidentified meta message GNIPREMOVE-Unidentified,2014-06-06T16:25:45.000Z,Unidentified meta message GNIPREMOVE-Unidentified,2014-06-06T16:25:45.000Z,Unidentified meta message GNIPREMOVE-Unidentified,2014-06-06T16:25:45.000Z,Unidentified meta message GNIPREMOVE-Unidentified,2014-06-06T16:25:45.000Z,Unidentified meta message

etc.

DrSkippy commented 10 years ago

Gnacs was designed to consume a real time stream, so it is looking for a succession of dictionary objects rather than a list. If you take off the [ and ] in your list (yes, that turns valid json into a sequence of valid json records that is not as a whole a valid json record) I think you will have success.

Thanks,

DrS

On Fri, Jun 6, 2014 at 8:11 AM, kaine987 notifications@github.com wrote:

yes, this is it:

[{"created_at":"Wed Jun 04 14:02:16 +0000 2014","id":474189394271010816,"id_str":"474189394271010816","text":"CNN Poll: Public upset over VA scandal; Obama remains at 43% http://t.co/iitvUnYBoA","source":"iOS","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1113229394,"id_str":"1113229394","name":"Hiram","screen_name":"HiramSipesh","location":"","url":null,"description":"Jusy want the world to be a better place!","protected":false,"followers_count":184,"friends_count":633,"listed_count":0,"created_at":"Wed Jan 23 02:32:03 +0000 2013","favourites_count":247,"utc_offset":null,"time_zone":null,"geo_enabled":false,"verified":false,"statuses_count":2013,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"C0DEED","profile_background_image_url":" http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/449360306759008256/nE9IZpIb_normal.jpeg","profile_image_url_https":"https://pbs.twimg.com/profile_images/449360306759008256/nE9IZpIb_normal.jpeg","profile_banner_url":"https://pbs.twimg.com/profile_banners/1113229394/1360892942","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"symbols":[],"urls":[{"url":"http://t.co/iitvUnYBoA","expanded_url":"http://politicalticker.blogs.cnn.com/2014/06/03/cnn-poll-public-upset-over-va-scandal-obama-remains-at-43/","display_url":"politicalticker.blogs.cnn.com/2014/06/03/cnn…","indices":[61,83]}],"user_mentions":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"en"},{"created_at":"Wed Jun 04 14:02:16 +0000 2014","id":474189395692908545,"id_str":"474189395692908545","text":"#Like

ACA #Obamacare! RT @JonathanTurley: Southwest Fined For Advertising $59

Seats That Did Not Exist http://t.co/FqwVHy2guE","source":"Hootsuite","truncated":false,"in_reply_to_status_id":474177225462267904,"in_reply_to_status_id_str":"474177225462267904","in_reply_to_user_id":94784682,"in_reply_to_user_id_str":"94784682","in_reply_to_screen_name":"JonathanTurley","user":{"id":24993788,"id_str":"24993788","name":"Dan Kleinman","screen_name":"SafeLibraries","location":"USA","url":" http://safelibraries.blogspot.com/","description":"SafeLibraries - Are Children Safe in Public Libraries? Tweet/RT all sides of ☛library crime ☛free speech ☛censorship ☛authors ☛child safety ☛bullying ☛teaching","protected":false,"followers_count":1111,"friends_count":1995,"listed_count":49,"created_at":"Wed Mar 18 00:47:18 +0000 2009","favourites_count":2016,"utc_offset":-14400,"time_zone":"Eastern Time (US & Canada)","geo_enabled":false,"verified":false,"statuses_count":44372,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"EDECE9","profile_background_image_url":" http://pbs.twimg.com/profile_background_images/267910806/ilovemylibrarian.jpg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/267910806/ilovemylibrarian.jpg","profile_background_tile":true,"profile_image_url":"http://pbs.twimg.com/profile_images/1447790466/4c198dc6-0d19-42ed-8fea-1c41a34367b8_normal.png","profile_image_url_https":"https://pbs.twimg.com/profile_images/1447790466/4c198dc6-0d19-42ed-8fea-1c41a34367b8_normal.png","profile_banner_url":"https://pbs.twimg.com/profile_banners/24993788/1348608084","profile_link_color":"088253","profile_sidebar_border_color":"D3D2CF","profile_sidebar_fill_color":"E3E2DE","profile_text_color":"634047","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"Like","indices":[0,5]},{"text":"ACA","indices":[6,10]},{"text":"Obamacare","indices":[11,21]}],"symbols":[],"urls":[{"url":"http://t.co/FqwVHy2guE","expanded_url":"http://wp.me/p6sYP-kMn","display_url":"wp.me/p6sYP-kMn"omhp-stud-146-50-221-89:json cristianocolacillo$ :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: :::::::::: -bash: ::::::::::: command not found omhp-stud-146-50-221-89:json cristianocolacillo$ cat tweets.json [{"created_at":"Wed Jun 04 14:02:16 +0000 2014","id":474189394271010816,"id_str":"474189394271010816","text":"CNN Poll: Public upset over VA scandal; Obama remains at 43% http://t.co/iitvUnYBoA","source":"iOS","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1113229394,"id_str":"1113229394","name":"Hiram","screen_name":"HiramSipesh","location":"","url":null,"description":"Jusy want the world to be a better place!","protected":false,"followers_count":184,"friends_count":633,"listed_count":0,"created_at":"Wed Jan 23 02:32:03 +0000 2013","favourites_count":247,"utc_offset":null,"time_zone":null,"geo_enabled":false,"verified":false,"statuses_count":2013,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"C0DEED","profile_background_image_url":" http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_image_url":"http://pbs.twimg.com/profile_images/449360306759008256/nE9IZpIb_normal.jpeg","profile_image_url_https":"https://pbs.twimg.com/profile_images/449360306759008256/nE9IZpIb_normal.jpeg","profile_banner_url":"https://pbs.twimg.com/profile_banners/1113229394/1360892942","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"symbols":[],"urls":[{"url":"http://t.co/iitvUnYBoA","expanded_url":"http://politicalticker.blogs.cnn.com/2014/06/03/cnn-poll-public-upset-over-va-scandal-obama-remains-at-43/","display_url":"politicalticker.blogs.cnn.com/2014/06/03/cnn…","indices":[61,83]}],"user_mentions":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"en"},{"created_at":"Wed Jun 04 14:02:16 +0000 2014","id":474189395692908545,"id_str":"474189395692908545","text":"#Like

ACA #Obamacare! RT @JonathanTurley: Southwest Fined For Advertising $59

Seats That Did Not Exist http://t.co/FqwVHy2guE","source":"Hootsuite","truncated":false,"in_reply_to_status_id":474177225462267904,"in_reply_to_status_id_str":"474177225462267904","in_reply_to_user_id":94784682,"in_reply_to_user_id_str":"94784682","in_reply_to_screen_name":"JonathanTurley","user":{"id":24993788,"id_str":"24993788","name":"Dan Kleinman","screen_name":"SafeLibraries","location":"USA","url":" http://safelibraries.blogspot.com/","description":"SafeLibraries - Are Children Safe in Public Libraries? Tweet/RT all sides of ☛library crime ☛free speech ☛censorship ☛authors ☛child safety ☛bullying ☛teaching","protected":false,"followers_count":1111,"friends_count":1995,"listed_count":49,"created_at":"Wed Mar 18 00:47:18 +0000 2009","favourites_count":2016,"utc_offset":-14400,"time_zone":"Eastern Time (US & Canada)","geo_enabled":false,"verified":false,"statuses_count":44372,"lang":"en","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"EDECE9","profile_background_image_url":" http://pbs.twimg.com/profile_background_images/267910806/ilovemylibrarian.jpg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/267910806/ilovemylibrarian.jpg","profile_background_tile":true,"profile_image_url":"http://pbs.twimg.com/profile_images/1447790466/4c198dc6-0d19-42ed-8fea-1c41a34367b8_normal.png","profile_image_url_https":"https://pbs.twimg.com/profile_images/1447790466/4c198dc6-0d19-42ed-8fea-1c41a34367b8_normal.png","profile_banner_url":"https://pbs.twimg.com/profile_banners/24993788/1348608084","profile_link_color":"088253","profile_sidebar_border_color":"D3D2CF","profile_sidebar_fill_color":"E3E2DE","profile_text_color":"634047","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"Like","indices":[0,5]},{"text":"ACA","indices":[6,10]},{"text":"Obamacare","indices":[11,21]}],"symbols":[],"urls":[{"url":"http://t.co/FqwVHy2guE","expanded_url":"http://wp.me/p6sYP-kMn","display_url":"wp.me/p6sYP-kMn","indices":[104,126]}],"user_mentions":[{"screen_name":"JonathanTurley","name":"Jonathan Turley","id":94784682,"id_str":"94784682","indices":[26,41]}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"en","NewObject-25":{}}, ..... ]

— Reply to this email directly or view it on GitHub https://github.com/DrSkippy/Gnacs/issues/33#issuecomment-45340619.

Scott Hendrickson 303.219.0022 @drskippy27 (transparency is the antidote for cynicism)

cris-cola commented 10 years ago

Thank you now it works

cris-cola commented 10 years ago

Is it possible to print all the fields contained in the .json file as output in the .csv file? Which options does it? thank you

I have used the given examples, and it works with your data samples, but why it only prints the "id", "postedTime" and the status? why not all the fields contained in the .json input?

moreover, if I use the same command with my own .json it doesn't print out a .csv file but returns the very same .json file and says 'skipping' in the end...

DrSkippy commented 10 years ago

Not today, but the ability to do that easily is coming in a refactor in a week or so.

DrS

On Mon, Jun 9, 2014 at 11:38 AM, kaine987 notifications@github.com wrote:

Is it possible to print all the fields contained in the .json file in input in a .csv file? Which options does it? thank you

— Reply to this email directly or view it on GitHub https://github.com/DrSkippy/Gnacs/issues/33#issuecomment-45519521.

Scott Hendrickson 303.219.0022 @drskippy27 (transparency is the antidote for cynicism)