AMI-system / gbif_download_standalone

A standalone repo to download images from the GBIF database according to a species list.
MIT License
0 stars 0 forks source link

Make the download code work without splitting dwca file #18

Closed LevanBokeria closed 1 year ago

LevanBokeria commented 1 year ago

Current codebase takes the extracted dwca occurrence file, and splits it by species and saves each as a separate CSV file. The downstream code then paralellizes (using multithreading) the image download over species, taking in paths to corresponding CSV occurrence files. This is memory efficient, so will allow upgrading the code to use multiprocessing instead of just multithreading.

Alternative code could just load the whole extracted occurrence file in memory and use multithreading to download images.