DocNow / hydrator

Turn Tweet IDs into Twitter JSON & CSV from your desktop!
MIT License
434 stars 64 forks source link

A JavaScript error occurred in the main process #48

Closed rahulbordoloi closed 4 years ago

rahulbordoloi commented 4 years ago

OS : Windows 10

Problem : Each time I try hydrating big dataset, I'm facing this Error. Though this error do-not show up for smaller dataset. Can you please look after this issue? The Screenshot for the same has been attached.

Problem

edsu commented 4 years ago

Thanks @rahulbordoloi does this happen when you are converting to CSV?

edsu commented 4 years ago

Accidentally closed this on my phone, but I reopened it!

rahulbordoloi commented 4 years ago

@edsu Yes, the JSONL format is working fine but when I try the same with converting it to CSV the above error shows up. And the CSV File created is of 0KB ie undefined.

edsu commented 4 years ago

Ok, I suspect that there is a corrupted line in the JSONL file. I will try this too.

rahulbordoloi commented 4 years ago

Tho, the chrome json parser works fine in my case, do you want me to send you the tweets id csv from which I hydrated the following jsonl and csv? If yes, you can drop off your mail here.

edsu commented 4 years ago

Sure I am ehs@pobox.com

rahulbordoloi commented 4 years ago

Sure I am ehs@pobox.com

Sent. Please Check Once.

edsu commented 4 years ago

Hydrator appears to be writing a blank line to the JSONL file when none of the tweet ids can be hdyrated. This is pretty rare unless the tweet ids have been corrupted in some way. But nevertheless it should not write blank lines because they could be problematic for downstream users of the JSONL that are attempting to find a complete JSON object on each line.

I noticed that there were blank lines in the SP.jsonl file you gave me @rahulbordoloi. I'm testing whether this causes problems for the CSV generation.

rahulbordoloi commented 4 years ago

Do the blank lines in between create a problem for me to work on with the JSONLs?

edsu commented 4 years ago

It depends on how you are processing them. Some JSON parsers may not care about being asked to parse a blank line. But it was the case that the Hydrator did care, it was throwing the error you reported when it was attempting to parse a blank line.

When you get a chance could you give v0.0.12 a try and see if your problem generating the CSV goes away?

Unfortunately I think your tweet id file has been corrupted. Do you see how they all end in 4 zeros? That is a good indicator that something processed the tweet ids that was unaware of overflow errors. I don't know if the file was that way when you downloaded it, or if Excel or some other tool mangled it. But it was useful here because the corrupted ids helped find a small bug in the Hydrator that wouldn't ordinarily get thrown.

rahulbordoloi commented 4 years ago

Regarding the Tweet IDs, I think the file might have been corrupted while downloading or I might have corrupted it somehow by opening it in Excel. Can you suggest me a batter way to go through the Tweet IDs beforehand without changing their type and corrupting them, just for a look through?

And Yes, I can now generate a CSV File now from the Generated JSONL. Thank You for the Fix and Happy to Contribute to such a handy and wonderful Project. Great Work. :)

I've Mailed you both the JSONL and CSV File. Please check if it's alright according to your desired output.

edsu commented 4 years ago

I recommend you use a text editor like VSCode, Emacs or Vim to inspect the ids. You may be able to open them in Excel but don't save them again, or else they will overflow and become useless. Thanks for reporting this issue. I'm closing for now since it seems like the latest version of Hydrator will not write blank newlines to the JSONL in these cases where large numbers of tweet ids cannot be hydrated.