JohnPaton / airbase

🌬 An easy downloader for the AirBase air quality data.
https://airbase.readthedocs.io
MIT License
8 stars 4 forks source link

Can't download a year of data, getting IO-Error: Too many open files #46

Closed heikoklein closed 3 months ago

heikoklein commented 3 months ago

Hi, I tried to download a year (2022) of data with:

airbase download --path 2022/ --year 2022 -p SO2 -p PM10 -p O3 -p NO2 -p CO -p NO -p PM2.5

That are a total of ~18000 files and the script crashed with an IOError: too many open files. I checked the ulimit -Sn which was 1024.

I managed at the end to download the data by

  1. finding a server with a file-limit of 4096
  2. splitting the request into one request per component (max 4300 files per component)
  3. having ~20Gb per component memory

I didn't find a possibility to restrict the number of simultaneously opened files. A semaphore as in this example might be needed to reduce resource-usage: https://github.com/Tinche/aiofiles/issues/83

Best regards, Heiko