Current codebase takes the extracted dwca occurrence file, and splits it by species and saves each as a separate CSV file. The downstream code then paralellizes (using multithreading) the image download over species, taking in paths to corresponding CSV occurrence files. This is memory efficient, so will allow upgrading the code to use multiprocessing instead of just multithreading.
Alternative code could just load the whole extracted occurrence file in memory and use multithreading to download images.
Current codebase takes the extracted dwca occurrence file, and splits it by species and saves each as a separate CSV file. The downstream code then paralellizes (using multithreading) the image download over species, taking in paths to corresponding CSV occurrence files. This is memory efficient, so will allow upgrading the code to use multiprocessing instead of just multithreading.
Alternative code could just load the whole extracted occurrence file in memory and use multithreading to download images.