DocNow / twarc-csv

A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV.
MIT License
31 stars 10 forks source link

Float IDs #51

Closed marcelo-mascarenhas closed 2 years ago

marcelo-mascarenhas commented 2 years ago

Hello!

Recently I've collected some tweets using twarc2 and, after the end of the retrieval, I converted the output into a '.csv' using twarc-csv.

However, some columns that I need to use store IDs as floats, as shown in the figure. When I try to convert it to integers, sometimes it yields a tweet ID that isn't correlated to the original post ( probably a rounding imprecision) . Is there a specific/correct way to convert these IDs to integers or the information was lost during the process of conversion?

Thanks! :)

unnamed

edsu commented 2 years ago

That is exponential a.k.a. scientific notation, sometimes used for large integers. What are you using to display the table? Can it be configured to display the integer? I suspect if you look at that row in your CSV it is an integer?

marcelo-mascarenhas commented 2 years ago

That is exponential a.k.a. scientific notation, sometimes used for large integers. What are you using to display the table? Can it be configured to display the integer? I suspect if you look at that row in your CSV it is an integer?

Hello, @edsu! :) Thanks for answering.

I've opened it in Python, using Pandas lib. Although it converted the entries of some columns that contain large IDs to integers, like the 'id' or 'conversation_id', others weren't converted and remained with the scientific notation, being typed as floats, as I stated above.

For some reason, when I opened earlier and was converting it to int, it was truncated the number, but now it worked. Apparently, as you said, it is just a way to displaying it.

Thanks nonetheless! :)