geodesymiami / insarmaps

3 stars 0 forks source link

parallelize hdfeos5_2json_mbtiles.py ? #41

Closed falkamelung closed 1 year ago

falkamelung commented 3 years ago

hdfeos5_2json_mbtiles.py currently uses only 1 CPU (I think). That is a problem for the TACC HPC systems because when you allocate a node you get 48 CPUs. You are not supposed to only use one core as it is a waste of resources. So it would be great to convert chunks simultaneously. In MintPy we were able to parallelize using dask (I can point you to the code).

I understand that json_mbtiles2insarmaps.py can't ingest into the database in parallel. TACC recommends to run all data download jobs on a login node. So I would only submit `hdfeos5_2json_mbtiles.py as a job and run json_mbtiles2insarmaps.py on the login node. Is it true that json_mbtiles2insarmaps.py is not doing any calculations? Only then this would work as we can't do calculations on the login node.

It is low priority, though. If we decide we don't want to do it I probably can run the ingest scripts on jetstream after uploading.

falkamelung commented 1 year ago

There is another newer issue on this