dh-miami / narratives_covid19

Digital Narratives of COVID19
9 stars 12 forks source link

Reject of your requested to be added to Zenodo community (Twitter Datasets) #2

Closed nkratzke closed 4 years ago

nkratzke commented 4 years ago

You send a request to be added to this Zenodo Twitter Community https://zenodo.org/communities/twitter-datasets

I would love to add your dataset to this list of Twitter-related datasets. However, it seems that your dataset provides a list of identifiers only. These identifiers seem to come without any context. Therefore, for most people these ids are meaningless, I am afraid.

It stays unclear to me, how this contextless list of identifiers could be used to run analysis on your dataset? Maybe you could make this more clear, add missing context data, and resubmit your request on Zenodo?

I will carefully reconsider your request if you feed me with a bit more context. Thank you very much.

BR Nane

dh-miami commented 4 years ago

Good morning, Thank you for your message. In fact, our dataset is a list of id that needs to be hydrated in other to recover the metadata. They are organized by day and each day contains a list of twitter ids for all tweets in Spanish related to covid19, then separated lists for tweets in Spanish in Argentina, Mexico, Colombia, Peru, Ecuador, Spain, and two more lists for Spanish and English for South Florida. When you say that the identifiers “come without any context” do you mean that we should add a clearer description of our datasets, or that normally users of Twitter dataset request more metadata information (besides the twitter id)? Thank you for your time, Best,

nkratzke commented 4 years ago

Yes, it was a bit unclear to me what these ids point to, are these the Tweet ids? If so, this should be made a bit more clear. Maybe it is even more helpful, to provide the hydrated dataset?

NK

dh-miami commented 4 years ago

Thanks for answering. Yes, they are tweet ids, and they are organized in txt files: by day (each folder is a day), by language (English, Spanish) and geolocation (Argentina, Mexico, Ecuador, Peru, Colombia, Florida). I did not hydrated the dataset because either way users have to hydrate them in order to get relevant data (especially the text). I could definitely change the description and make it clearer. If this do not work to be on the Zenodo community of Twitter datasets, I will give it a try later on in the future, when I have more time, and create a json file with date, geo, and user id if you think it is more useful. Thanks!!!

nkratzke commented 4 years ago

OK, I understand. It should be sufficient if this would be made more clear in the Zenodo dataset description. Also, a short hint how to hydrate the dataset using the identifiers would be helpful in the Zenodo dataset description.

I assume extending the Github README.md would be helpful as well.

Thank you.

NK

dh-miami commented 4 years ago

Ok, I'll do that! Thanks for your help!

nkratzke commented 4 years ago

Request accepted.