ail-project / lacus

Lacus is a capturing system using playwright, as a web service.
BSD 3-Clause "New" or "Revised" License
45 stars 3 forks source link

Timeout isn't working for the url (onion) #41

Open FafnerKeyZee opened 3 days ago

FafnerKeyZee commented 3 days ago

Hello,

Using the following code with pylacus uuid = lacus.enqueue(url=host['slug'], general_timeout_in_sec=90)

The timeout is not respected with Lacus (results by running tools/monitor.py)

{
  "max_concurrent_captures": 150,
  "max_capture_time": 3600,
  "ongoing_captures": 3,
  "captures_time": {
    "f7b3fc60-9dea-474b-bf8e-fe63e0605377": 228.637663,
    "be37d5c2-494a-4184-9da5-6474df9c29f0": 230.759614,
    "4ab61217-f770-4df0-842c-640c7c439df9": 266.525206
  },
  "enqueued_captures": 0
}
Rafiot commented 1 day ago

Some captures get stuck and the playwright timeouts aren't respected (that's the 90sec you pass as parameter), but the capture (often) finishes, at some point. If it doesn't, the max_capture_time it the one that kicks in and it will kill the capture regardless of its status.

I'll review the script again to make sure I'm not forgetting a timeout setting somewhere, but in your case, I'd recommend to reduce the max_capture_time to something like 600s or so, to make sure a stuck capture is killed somewhat faster.

general_timeout_in_sec evolved into a best effort setting.