akiyamasho / twitter-scraper-chrome-extension

Code is from Google Web Store, modified for own use case
https://chrome.google.com/webstore/detail/twitter-scraper/cedomiiokkcmbeoekchahgmfcppnclal
3 stars 3 forks source link

Tweet ID is missing #1

Open joao opened 1 year ago

joao commented 1 year ago

Tweet ID is missing, researched how to fix it but wasn't able to.
Perhaps is something related to the brand change to X? Any tips on how to fix?

akiyamasho commented 1 year ago

Hi @joao , I tried running it today and the Tweet ID still shows up correctly.

スクリーンショット 2023-08-14 0 21 23

May I ask which account you're trying to scrape and error logs if you can record them? I'll try running it locally to check

joao commented 1 year ago

Thank you @akiyamasho. I managed to fix it a few hours ago by changing this code (original one commented), in assets/js/getArticleSource.js:

//var linkDom = element.querySelector("a[aria-label*=日]");  
var linkDom = element.querySelectorAll("a[role=link]")[3];

Must be some issue with my Twitter account being localized (even if I have the UI in english) or something like that. With the code change I'm selecting the 3rd link in a tweet, after the avatar and profile, selecting the date with the link that has the Tweet ID.

Another issue came up: a profile I'm trying to get tweets from is a portuguese politician, @andrecventura, but it's missing some initial tweets back from the first visible. For example, I want just all the tweets in August, but between the 5th and 10th of August, no tweets are retrieved, when one can see them in the interface. Any idea of what might be happening?

Thank you.

joao commented 1 year ago

Had some time, and went to test on other accounts - it skips whole data intervals. Tried the account @partido_pan, to retrieve tweets since Jan/2023, it starts well than it gets to 07/August and the previous tweet scrapped is from 27/July.

After exporting went to visually check and there are over a dozen tweets in that time period that are in the browser, but not in the localstorage, analytics UI or exported CSV.

Checking the Extension errors, there is indeed something being logged, related to 'time' so it might be related (screenshot on attachement). On a first view, it seems it might only be related to showing information on the extensions UI pane. Any thoughts?

Thank you.

screenshot 2023-08-13 at 20 01 06
joao commented 1 year ago

Found a workaround/fix, but not the origin in the code of the issue.

As Twitter's website is built on React, and due to approach chosen, it what is visible on the viewport and a bit more padding, and the extension only scrapping what is rendered, not all tweets that returned by the search. So the solution is to set increase the browser height to as most as possible and set the zoom at 25% and delay in retrieving content to 5s - it retrieved all the tweets this way.