apify-projects / apify-extra-library

Advanced and experimental functionality for Apify Actors and Crawlee. Use with caution!
Apache License 2.0
10 stars 1 forks source link

Unlimited RQ / remote dynamic code executor #2

Open pocesar opened 3 years ago

pocesar commented 3 years ago

Brainstorm time. Express server that has 2 endpoints:

  1. how retries are done? they come back to the queue or stay retrying until giving up?
  2. Registering the code from the "main" process (ie. another scraper/task) can serialize the function as in:
    await fetch(/* container url register */, { 
      body: myWorkerCode.toString()
    })
  3. since it's a fire-and-forget mechanism (and the data will be pushed to the designated dataset), how could de-duping occur?
metalwarrior665 commented 3 years ago

There are definitely plenty of ways we could do this. I would probably start with trying to mimic the queue we have with most of its functionality. Then it could just work the same in crawlers, just instead of calling Apify API, it would call the server. The server could run on the same actor or spawn another actor.

  1. We can mimic how Apify queue works
  2. I'm not sure if I understand this
  3. Why push to a dataset? I thought we would keep an object/Map in memory. Probably I'm thinking about something else :)
pocesar commented 3 years ago

actually, the "Unlimited RQ" is misnomer. it works as a remote dynamic code executor, but faster than RQ / enqueueing, as they are dealt as they arrive. the detail extraction worker code is passed to the server on register.