disinfoRG / ZeroScraper

Web scraper made by 0archive.
https://0archive.tw
MIT License
10 stars 2 forks source link

Improve process management /w process start time #104

Open pm5 opened 4 years ago

pm5 commented 4 years ago

42 established a Variable table in db for each of the discover and update processes to register their PIDs. Recently we found quite a few of cases where these processes could be killed without a chance to clear up their PID entries from the table. That would block all following processes, because it would appear that there is still a process running. This has resulted in spiders outage for a day or more in the last few weeks.

We should add a process start time information to the PID entry, so that when a spider process attempts to register a PID and found there is already an entry older than 2 or 3 hours, it can assume that the older process has died without clean up the entry.