adjtomo / pysep

Seismogram Extraction and Processing: Seismic data retrieval and record sections
https://pysep.readthedocs.io
MIT License
29 stars 15 forks source link

Automated parallel computing to speed up data download #128

Open aakash10gupta opened 11 months ago

aakash10gupta commented 11 months ago

While downloading data using PySEP, some steps of the procedure seem to be very slow, and if the flags requiring these steps are turned on, the data download can take a long time. I was wondering if it is possible to parallelize the fetching and processing of the seismograms. Most of the machines have multiple processors at their disposal but only one processor seems to be used for all of the processes unless the user explicitly codes a workaround to use multiple cores simultaneously for the processing. It might be beneficial to have a feature within PySEP which can detect the number of processors in a users machine and speed up the process accordingly. The processes that seem to be the slowest are response removal and resampling, and would largely benefit from the parallelization. I am not sure about the implementational challenges and would be eager to hear thoughts from others.

bch0w commented 11 months ago

Thanks for the suggestion @aakash10gupta, I think that's a good idea. I'm not sure about speeding up data fetching as a lot of that is handled by ObsPy, but we can definitely parallelize the processing. It would take some reorganization of the code but should be manageable. Let me look into it and see what would be required.