Rishikant181 / Rettiwt-API

A CLI tool and an API for fetching data from Twitter for free!
https://rishikant181.github.io/Rettiwt-API/
MIT License
491 stars 46 forks source link

Delay in Fetching Tweets #557

Closed nfadeluca closed 1 month ago

nfadeluca commented 5 months ago

I developed a program that creates several processes, each utilizing its own API account. These processes run simultaneously, scanning for tweets from a single account every 60 seconds using rettiwt.tweet.search(filter). To prevent rate limiting, each of the six processes scans the same profile, staggered by 10 seconds, effectively simulating scanning every 10 seconds.

However, when testing, it seems that the api(s) only recognizing when an account has posted ~ a full minute after it made the post. Is this normal for the tweet.search() function? Or should it pick up on new tweets almost instantly?

For the purpose of my program, it is important to pickup on a newly posted tweet from a specific account almost instantly.

Rishikant181 commented 5 months ago

rettiwt.tweet.search should indeed return result instantaneously, that is, if I someone makes a tweet, and soon after I run rettiwt.tweet.search, I should get the newly created tweet. Can you share the code you use to simulate scans every 10 seconds?

nfadeluca commented 5 months ago

Here's the abridged version of the code:

main.ts

const API_KEYS = [
... 6 api rettiwt api keys here ...
];

// pass username to scan as argument
const profile = process.argv[2];
if (!profile) {
   console.error('Please provide a profile name.');
   process.exit(1);
}

if (cluster.isPrimary) {
   const numCPUs = os.cpus().length;
   const numWorkers = Math.min(API_KEYS.length, numCPUs);

   // 6 worker processes
   for (let i = 0; i < numWorkers; i++) {
      setTimeout(() => {
         const worker = cluster.fork();
         worker.send({ profile, apiKey: API_KEYS[i] });
      }, i * 10000); // 10-second offset
   }
   // exit all worker processes if one fails
   cluster.on('exit', (worker, code, signal) => {
      console.log(`Worker ${worker.process.pid} died`);
      cluster.fork();
   });
} else {
   process.on('message', (msg: unknown) => {
      if (typeof msg === 'object' && msg !== null && 'profile' in msg && 'apiKey' in msg) {
         const { profile, apiKey, } = msg as { profile: string; apiKey: string; };
         import('./worker').then(workerModule => {
            workerModule.startWorker(profile, apiKey);
         });
      }
   });
}

worker.ts

// scan tweets, if tweet matching regex found, execute sendTM(client)
async function scanTweets(username: string, client: tc) {
      try {
         const rettiwt = new Rettiwt({ apiKey });

         const startTime = Date.now();

         const filter = new TweetFilter({
            fromUsers: [username]
         });

         const data: CursoredData<Tweet> = await rettiwt.tweet.search(filter);

         if (data && data.list && data.list.length > 0) {
            for (let tweet of data.list) {
               const tweetText = tweet.fullText;
               const matches = tweetText.match(/\b\w{32,44}\b/);
               if (matches) {
                  for (let match of matches) {
                     const matchTime = Date.now();
                     console.log(`Found match: ${match}`);
                     await sendTM(client);
                     // this usually returns ~350ms
                     console.log(`Processed match in ${Date.now() - matchTime}ms`);
                     process.exit(0);
                  }
               }
            }
         }

      } catch (error) {
         console.error('Error scanning tweets:', error);
      }
   }

while (true) {
  // scan every 60s to avoid rate limiting
  const currentTime = new Date().toLocaleTimeString([], { hour: 'numeric', minute: '2-digit', second: '2-digit' });
  console.log(`[${currentTime}] Scanning tweets for profile ${profile} (Process ID: ${process.pid})`);
  await scanTweets(profile, client);
  await sleep(60000);
}

Just to clarify, the >60s delay is not whilst calling the search function itself; once I make a test tweet, the next five or six or so times the search function is called, it will return no matching tweets. After about a minute or so after posting, the search function returns the matching tweet, but I assume the api should pick up on the new tweet far quicker than that.

Rishikant181 commented 5 months ago

Did you try using the rettiwt.tweet.stream method? That method streams tweet in pseudo-realtime, given a scan delay.

Here is the documentation regarding it's usage.

nfadeluca commented 5 months ago

Just tried that, ran into the same issue unfortunately.

Also decided to isolate the search function and test that way

import { Rettiwt } from 'rettiwt-api';
import { TweetFilter } from 'rettiwt-core';
import { Tweet } from 'rettiwt-api/dist/models/data/Tweet';
import { CursoredData } from 'rettiwt-api/dist/models/data/CursoredData';

async function scanTweets(username: string, apiKey: string) {
   try {
      const rettiwt = new Rettiwt({ apiKey });

      const filter = new TweetFilter({
         fromUsers: [username]
      });

      const data: CursoredData<Tweet> = await rettiwt.tweet.search(filter);

      if (data && data.list && data.list.length > 0) {
         for (let tweet of data.list) {
            const tweetText = tweet.fullText;
            const matches = tweetText.match(/\b\w{32,44}\b/);
            if (matches) {
               console.log("Found");
               return;
            }
         }
      } else {
         console.log("No tweets found.");
      }

   } catch (error) {
      console.error('Error scanning tweets:', error);
   }
}

const apiKey = 'api key here';
const username = 'username here';

scanTweets(username, apiKey);

As expected, when manually running the function some times after posting the tweet, it only prints found after about 60 seconds, (five or 6 runs of the program above).

Rishikant181 commented 5 months ago

Strange, it should be returning as soon as a scan is done.

Does it happen with a specific user in fromUsers or does it happen with all target users?

nfadeluca commented 5 months ago

The same behavior occurs regardless of the profile username being passed (I've tested with a few different profiles).

nfadeluca commented 5 months ago

@Rishikant181 Is it possible to verify on your end that streaming detects tweet immediately after it's posted? Or at least less than 60 seconds after?

Rishikant181 commented 5 months ago

Yeah on my end, it's working as expected, i.e, returning tweets as soon as they are created. I have not been able to reproduce your issue. Can you try it with any other account and see if the same happens?

nakamuraos commented 2 months ago

I also have this problem. If using the same account with apiKey it will appear immediately, but searching for other usernames cannot find new tweets.

Rishikant181 commented 2 months ago

If using the same account with apiKey it will appear immediately, but searching for other usernames cannot find new tweets

Checking this out

Rishikant181 commented 2 months ago

@nakamuraos I'm unable to reproduce this issue. Can you provide the username of the accounts whose tweets can't be found?

nakamuraos commented 2 months ago

@Rishikant181 (from:Luat_T97) I realized that if product=Latest twitter did not return more complete results than product=Top when searching on twitter.

Rishikant181 commented 2 months ago

@nakamuraos product=Latest returns return all tweets sorted by the the they were posted at, from recent at the top, to oldest at the bottom. That's why it's the one filter that rettiwt uses to search for tweets.

Can you post the code snippet you use for fetching the tweets?

Rishikant181 commented 1 month ago

@nfadeluca @nakamuraos This might be the reason. If yes, then it's been patched in #638 .

Rishikant181 commented 1 month ago

@nfadeluca @nakamuraos Please update to v4.1.4