ClericPy / ichrome

Chrome controller for Humans, based on Chrome Devtools Protocol(CDP) and python3.7+.
https://pypi.org/project/ichrome/
MIT License
227 stars 29 forks source link

Questions related to ichrome_user_data #138

Closed juanfrilla closed 11 months ago

juanfrilla commented 11 months ago
  1. Why ichrome is generating data in the ichrome_user_data folder?
  2. Can it be deleted or any option to delete after every execution?
  3. Is there any option to deactivate it? It takes a lot of space from the disk.

Here its a screenshot: Captura de pantalla 2023-09-21 a las 9 16 04

ClericPy commented 11 months ago

When the chrome process starts, it relies on a user directory to distinguish different contexts, so a directory will be created every time. In order to distinguish the process, ichrome will generate a different directory for each port number for isolation.

Back to the question you asked,

  1. avoid generating user dir This cannot be done with a directory, but you can set a custom directory for storage with user_data_dir="./somedir" to replace the default home directory.
  2. clear_after_shutdown=True, to auto clear user dir after shutdowning chrome daemon
  3. use incognito_tab method for concurrency usage, if you use multiple ports for chrome daemons with proxies.

PS: why you use so many different port here?

juanfrilla commented 11 months ago

Thanks @ClericPy, i'll use clear_after_shutdown=True I use different ports because I'm using different scrapy spiders at the same time (https://github.com/ClericPy/ichrome/issues/131) But I think I'm going to use a shared browser (https://github.com/ClericPy/ichrome/issues/129) to have only one opened browser and all the spiders will be conected to that browser