Id mismatch? - Githubissues

chmod commented 1 year ago

I am running the command snscrape --jsonl --progress --max-results 1 twitter-search "@Eikonikos_HQ filter:replies -from:Eikonikos_HQ" which currently returns (removed stuff for brevity)

{
  "_type": "snscrape.modules.twitter.Tweet",
  "url": "https://twitter.com/estfino/status/1669700650467246080",
  "id": 1669700650467246000,
  "conversationId": 1669700100971536400,
  "inReplyToTweetId": 1669700100971536400,
   .....
}

Shouldn't id be 1669700650467246080 ?
Shouldn't inReplyToTweetId be 1669700100971536389 ?

My understanding is that the last part of a tweet url is the tweet id. The current IDs provided link to error page.

Edit: I am using version of GitHub

JustAnotherArchivist commented 1 year ago

snscrape returns the correct IDs:

$ snscrape --jsonl --progress --max-results 1 twitter-search "@Eikonikos_HQ filter:replies -from:Eikonikos_HQ"
{"_type": "snscrape.modules.twitter.Tweet", "url": "https://twitter.com/estfino/status/1669700650467246080", "date": "2023-06-16T13:37:11+00:00", "rawContent": "<snip>", "renderedContent": "<snip>", "id": 1669700650467246080, "user": { <snip>

You are likely passing it through jq or a similar software which doesn't parse large integers correctly. (jq has fixed that bug some time ago, but it isn't released yet.) That's what mangles the IDs.

If you can't switch to a JSON parser that isn't broken, you can use --jsonl-for-buggy-int-parser to emit JSONL with additional id.str etc. string fields for each field with an integer exceeding the float precision.

chmod commented 1 year ago

Thank you for the reply. I'll update the jq.

JustAnotherArchivist / snscrape

Id mismatch? #972