hdfeos5_2json_mbtiles.py currently uses only 1 CPU (I think). That is a problem for the TACC HPC systems because when you allocate a node you get 48 CPUs. You are not supposed to only use one core as it is a waste of resources. So it would be great to convert chunks simultaneously. In MintPy we were able to parallelize using dask (I can point you to the code).
I understand that json_mbtiles2insarmaps.py can't ingest into the database in parallel. TACC recommends to run all data download jobs on a login node. So I would only submit `hdfeos5_2json_mbtiles.py as a job and run json_mbtiles2insarmaps.py on the login node. Is it true that json_mbtiles2insarmaps.py is not doing any calculations? Only then this would work as we can't do calculations on the login node.
It is low priority, though. If we decide we don't want to do it I probably can run the ingest scripts on jetstream after uploading.
hdfeos5_2json_mbtiles.py
currently uses only 1 CPU (I think). That is a problem for the TACC HPC systems because when you allocate a node you get 48 CPUs. You are not supposed to only use one core as it is a waste of resources. So it would be great to convert chunks simultaneously. In MintPy we were able to parallelize using dask (I can point you to the code).I understand that
json_mbtiles2insarmaps.py
can't ingest into the database in parallel. TACC recommends to run all data download jobs on a login node. So I would only submit`hdfeos5_2json_mbtiles.py
as a job and runjson_mbtiles2insarmaps.py
on the login node. Is it true thatjson_mbtiles2insarmaps.py
is not doing any calculations? Only then this would work as we can't do calculations on the login node.It is low priority, though. If we decide we don't want to do it I probably can run the ingest scripts on jetstream after uploading.