amazon_rtx_3070 container process forks

hansdg1 commented 3 years ago

Had an issue last night where my docker host stopped allowing new processes. After some digging, it seems like the amazon_rtx_3070 container may be responsible for this. Compared to the others, it's forking a ton of processes. Has anyone else seen this?

This screenshot was taken a few minutes after launch $ docker stats Screenshot from 2020-12-03 10-52-37

I'm using the default config/amazon_rtx_3070.yaml from the latest commit

Here's the docker logs output for the container. Not sure if the missing title entries could be related to this or not.

$ docker logs -f amazon_rtx_3070
D2020-12-03 17:25:48,107 starting with args: /src/run.py --alerter email --email <redacted> --relay <redacted>
D2020-12-03 17:25:48,394 using parser: lxml
D2020-12-03 17:25:48,394 registering custom scraper for domain: amazon
D2020-12-03 17:25:48,395 registering custom scraper for domain: bestbuy
D2020-12-03 17:25:48,396 registering custom scraper for domain: bhphotovideo
D2020-12-03 17:25:48,397 registering custom scraper for domain: microcenter
D2020-12-03 17:25:48,399 registering custom scraper for domain: newegg
D2020-12-03 17:25:48,400 registering custom scraper for domain: walmart
I2020-12-03 17:25:48,419 scraper initialized for https://www.amazon.com/dp/B08HBF5L3K
I2020-12-03 17:25:48,420 scraper initialized for https://www.amazon.com/dp/B08HBJB7YD
I2020-12-03 17:25:48,420 scraper initialized for https://www.amazon.com/dp/B08KWLMZV4
I2020-12-03 17:25:48,420 scraper initialized for https://www.amazon.com/dp/B08KWN2LZG
I2020-12-03 17:25:48,420 scraper initialized for https://www.amazon.com/dp/B08KWPDXJZ
I2020-12-03 17:25:48,420 scraper initialized for https://www.amazon.com/dp/B08KXZV626
I2020-12-03 17:25:48,421 scraper initialized for https://www.amazon.com/dp/B08KY266MG
I2020-12-03 17:25:48,421 scraper initialized for https://www.amazon.com/dp/B08KY322TH
I2020-12-03 17:25:48,421 scraper initialized for https://www.amazon.com/dp/B08L8HPKR6
I2020-12-03 17:25:48,421 scraper initialized for https://www.amazon.com/dp/B08L8JNTXQ
I2020-12-03 17:25:48,421 scraper initialized for https://www.amazon.com/dp/B08L8KC1J7
I2020-12-03 17:25:48,421 scraper initialized for https://www.amazon.com/dp/B08L8L71SM
I2020-12-03 17:25:48,421 scraper initialized for https://www.amazon.com/dp/B08L8L9TCZ
I2020-12-03 17:25:48,422 scraper initialized for https://www.amazon.com/dp/B08L8LG4M3
I2020-12-03 17:25:48,422 scraper initialized for https://www.amazon.com/dp/B08LF1CWT2
I2020-12-03 17:25:48,422 scraper initialized for https://www.amazon.com/dp/B08LF32LJ6
I2020-12-03 17:25:48,422 scraper initialized for https://www.amazon.com/dp/B08LW46GH2
W2020-12-03 17:25:50,424 warning: using selenium webdriver for scraping... this feature is under active development
W2020-12-03 17:25:52,334 missing title: https://www.amazon.com/dp/B08HBF5L3K
I2020-12-03 17:25:52,337 B08HBF5L3K: not in stock
W2020-12-03 17:25:54,313 missing title: https://www.amazon.com/dp/B08HBJB7YD
I2020-12-03 17:25:54,315 B08HBJB7YD: not in stock
W2020-12-03 17:25:56,306 missing title: https://www.amazon.com/dp/B08KWLMZV4
I2020-12-03 17:25:56,308 B08KWLMZV4: not in stock
W2020-12-03 17:25:58,305 missing title: https://www.amazon.com/dp/B08KWN2LZG
I2020-12-03 17:25:58,309 B08KWN2LZG: not in stock
W2020-12-03 17:26:00,323 missing title: https://www.amazon.com/dp/B08KWPDXJZ
I2020-12-03 17:26:00,326 B08KWPDXJZ: not in stock
W2020-12-03 17:26:02,326 missing title: https://www.amazon.com/dp/B08KXZV626
I2020-12-03 17:26:02,329 B08KXZV626: not in stock
W2020-12-03 17:26:04,314 missing title: https://www.amazon.com/dp/B08KY266MG
I2020-12-03 17:26:04,316 B08KY266MG: not in stock
W2020-12-03 17:26:06,377 missing title: https://www.amazon.com/dp/B08KY322TH
I2020-12-03 17:26:06,380 B08KY322TH: not in stock
W2020-12-03 17:26:08,337 missing title: https://www.amazon.com/dp/B08L8HPKR6
I2020-12-03 17:26:08,340 B08L8HPKR6: not in stock
W2020-12-03 17:26:10,328 missing title: https://www.amazon.com/dp/B08L8JNTXQ
I2020-12-03 17:26:10,330 B08L8JNTXQ: not in stock
W2020-12-03 17:26:12,314 missing title: https://www.amazon.com/dp/B08L8KC1J7
I2020-12-03 17:26:12,316 B08L8KC1J7: not in stock
W2020-12-03 17:26:14,316 missing title: https://www.amazon.com/dp/B08L8L71SM
I2020-12-03 17:26:14,318 B08L8L71SM: not in stock
W2020-12-03 17:26:16,330 missing title: https://www.amazon.com/dp/B08L8L9TCZ
I2020-12-03 17:26:16,333 B08L8L9TCZ: not in stock
W2020-12-03 17:26:18,308 missing title: https://www.amazon.com/dp/B08L8LG4M3
I2020-12-03 17:26:18,311 B08L8LG4M3: not in stock
W2020-12-03 17:26:20,305 missing title: https://www.amazon.com/dp/B08LF1CWT2
I2020-12-03 17:26:20,308 B08LF1CWT2: not in stock
W2020-12-03 17:26:22,324 missing title: https://www.amazon.com/dp/B08LF32LJ6
I2020-12-03 17:26:22,327 B08LF32LJ6: not in stock
W2020-12-03 17:26:24,348 missing title: https://www.amazon.com/dp/B08LW46GH2
I2020-12-03 17:26:24,350 B08LW46GH2: not in stock
W2020-12-03 17:26:26,337 missing title: https://www.amazon.com/dp/B08HBF5L3K
I2020-12-03 17:26:26,340 B08HBF5L3K: not in stock
W2020-12-03 17:26:28,378 missing title: https://www.amazon.com/dp/B08HBJB7YD
I2020-12-03 17:26:28,380 B08HBJB7YD: not in stock
W2020-12-03 17:26:30,348 missing title: https://www.amazon.com/dp/B08KWLMZV4
I2020-12-03 17:26:30,350 B08KWLMZV4: not in stock
W2020-12-03 17:26:32,331 missing title: https://www.amazon.com/dp/B08KWN2LZG
I2020-12-03 17:26:32,334 B08KWN2LZG: not in stock
W2020-12-03 17:26:34,345 missing title: https://www.amazon.com/dp/B08KWPDXJZ
I2020-12-03 17:26:34,347 B08KWPDXJZ: not in stock
W2020-12-03 17:26:36,374 missing title: https://www.amazon.com/dp/B08KXZV626
I2020-12-03 17:26:36,376 B08KXZV626: not in stock
W2020-12-03 17:26:38,332 missing title: https://www.amazon.com/dp/B08KY266MG
I2020-12-03 17:26:38,335 B08KY266MG: not in stock

hansdg1 commented 3 years ago

Upon further digging, there are a ton of defunct chromium processes that aren't getting killed

$ ps auxf
...
root       86382  2.1  2.3 124308 95400 ?        Ss   11:25   0:23      \_ python /src/run.py --alerter email --email <redacted> --relay <redacted>
root       86471  0.0  0.0      0     0 ?        Z    11:25   0:00          \_ [chromium] <defunct>
root       86472  0.0  0.0      0     0 ?        Z    11:25   0:00          \_ [chromium] <defunct>
root       86493  0.0  0.0      0     0 ?        Z    11:25   0:00          \_ [chromium] <defunct>
root       86502  0.0  0.0      0     0 ?        Z    11:25   0:00          \_ [chromium] <defunct>
root       86513  0.0  0.0      0     0 ?        ZN   11:25   0:00          \_ [chromium] <defunct>
root       86550  0.0  0.0      0     0 ?        Z    11:25   0:00          \_ [chromium] <defunct>
root       86551  0.0  0.0      0     0 ?        Z    11:25   0:00          \_ [chromium] <defunct>
root       86583  0.0  0.0      0     0 ?        Z    11:25   0:00          \_ [chromium] <defunct>
...(continues)

utahbmxer commented 3 years ago

Haven't ran the same commands as you to verify, but twice I have been unable to run any new processes on my centos machine (including top) as well as crashed containers, so I have to reboot. Never done this until I started running this container yesterday.

utahbmxer commented 3 years ago

Just ran stats and my amazon container (checking for rtx 3080) had over 5600 PIDS. Ouch.

hansdg1 commented 3 years ago

I also noticed that a container built from config/ps5.yaml also experiences this same issue. Presumably because it includes an amazon listing.

I don't have the time to dig into this at the moment, but I wanted to at least share what I found.

EricJMarti commented 3 years ago

Hi all, I'm aware of the issue where the selenium driver (which is used for Amazon web scrapes) creates an infestation of chromium zombies. I'm working on a fix.

utahbmxer commented 3 years ago

I kind of poked around, but I suck at python. I did create a workaround that seems to be helping for me. I created a cronjob that restarts my amazon container every hour. docker restart <container_id>

Additionally, to help with some occasional human prompts from newegg that crash the container, I updated my newegg containers with the --restart flag. Could probably do all of them and they would restart on reboot as well.

docker update --restart unless-stopped <container_id>

richard11235 commented 3 years ago

I fixed this issue by using the method outlined here. This is implemented in PR #76 Basically explicitly stop and reap the children is my understanding of it.

EricJMarti commented 3 years ago

Please pull the latest image using:

$ docker pull ericjmarti/inventory-hunter:latest

EricJMarti / inventory-hunter

amazon_rtx_3070 container process forks #51