GeoscienceAustralia / PyRate

A Python tool for estimating velocity and time-series from Interferometric Synthetic Aperture Radar (InSAR) data.
https://geoscienceaustralia.github.io/PyRate/
Apache License 2.0
201 stars 71 forks source link

Sb/step3 refactor process 2 #284

Closed basaks closed 4 years ago

basaks commented 4 years ago
  1. Extensive testing of refpixel write on disc and reuse
  2. Extensive testing of orbital error write on disc and reuse
  3. Decluttered orbital.py module.
  4. Updates to both refpixel and orbital error files names on disc to include all relevelant params to reduce user oversight requirements
codecov-commenter commented 4 years ago

Codecov Report

Merging #284 into develop will decrease coverage by 5.78%. The diff coverage is 90.90%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #284      +/-   ##
===========================================
- Coverage    90.71%   84.92%   -5.79%     
===========================================
  Files           26       26              
  Lines         3305     3336      +31     
  Branches       514      516       +2     
===========================================
- Hits          2998     2833     -165     
- Misses         215      404     +189     
- Partials        92       99       +7     
Impacted Files Coverage Δ
pyrate/core/shared.py 94.31% <ø> (-0.95%) :arrow_down:
pyrate/default_parameters.py 100.00% <ø> (ø)
pyrate/core/refpixel.py 88.99% <68.75%> (+1.72%) :arrow_up:
pyrate/configuration.py 96.09% <92.10%> (-1.83%) :arrow_down:
pyrate/core/orbital.py 92.59% <94.91%> (+4.27%) :arrow_up:
pyrate/core/config.py 93.47% <100.00%> (-0.34%) :arrow_down:
pyrate/core/ref_phs_est.py 83.20% <100.00%> (-5.60%) :arrow_down:
pyrate/process.py 97.56% <100.00%> (-2.44%) :arrow_down:
pyrate/main.py 0.00% <0.00%> (-98.39%) :arrow_down:
pyrate/merge.py 18.49% <0.00%> (-76.72%) :arrow_down:
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 313483f...5f0f221. Read the comment docs.

basaks commented 4 years ago

Evidence of unchanged output in develop using the pyrate_gamma_test.conf.

Commands used:

 1020  git checkout sb/step3-refactor-process-2 
 1021  git pull
 1022  rm -rf out
 1023  mpirun -n 4 pyrate workflow -f tests/test_data/small_test/conf/pyrate_gamma_test.conf 
 1026  git branch
 1027  mv out/gamma/ out/gamma_new
 1028  git checkout develop
 1029  git pull
 1031  mpirun -n 4 pyrate workflow -f tests/test_data/small_test/conf/pyrate_gamma_test.conf 
 1033  for file in `ls out/gamma_new/out/ts*.tif`; do echo $file; gdalinfo $file -stats | grep STATISTICS; done > branch.txt
 1034  for file in `ls out/gamma/out/ts*.tif`; do echo $file; gdalinfo $file -stats | grep STATISTICS; done > develop.txt

Attached outputs. branch.txt develop.txt

mcgarth commented 4 years ago

Why does this work in the test configs?

because the test configs (that I can see) only have one instance of orbfit: assignment. I think you didn't notice that orbfit: was moved to the top of input_parameters.conf during a recent restructure.

basaks commented 4 years ago

@chandra2ga, Can you double check your first ovservation? When I test using this command:

mpirun -n 4 pyrate workflow -f tests/test_data/small_test/conf/pyrate_gamma_test.conf

This file is created:

out/gamma/out/ref_pixel150.941666654-34.218333314_5_5_5_0.8_file.npy

This test also proves it's created and reused in subsequent runs:

https://github.com/GeoscienceAustralia/PyRate/blob/febc8837b60fb19183ceefaa07c64c34d3cd4562/tests/test_refpixel.py#L433

There are several reasons for the chosen filenames:

  1. The whole reason you want the parameter names in the filename is that you don't overwrite that file when parameters change on the next run, and can reuse that file if it's available on disc. In this situation if you provided (-1, -1) first, then changed to a (lat, lon), and then went back to (-1, -1), the saved refpixel file will not be overwritten and reused when you go back to the old param. You suggestion will overwrite file, and in this example, on the 3rd job, refpixel will have to be recomputed. ref_pixel.npy does not work and additional effort was spent to make this change.
  2. These filenames are not for users to inspect, while still being able to do so. They don't have to be pretty. These are intermediate files to enhance usability.
  3. This is an exact string representation of the config parameters used.

On your second point, again the user does not have to really know which parameters are in the orbfit filename, but if the appropriate file is available on disc, they are reused if the parameters match. If you want to know which parameters go in which sequence, you can inspect this line of code for the orbfit file names:

https://github.com/GeoscienceAustralia/PyRate/blob/febc8837b60fb19183ceefaa07c64c34d3cd4562/pyrate/configuration.py#L108

basaks commented 4 years ago

@mcgarth multillooking for orbfit also requries effort.

Since independent method is trivially parallisable, we don't really need multilooking in this case. We have to multilook the file and then canculate the orbital error in in this case. In memory multilooking is available due to network method, and can be very quickly implemented if we really want that.

It's this line of code that becomes cheaper if we multilook....multilooking will reduce effort here depending on the multilooking factor.

  model = lstsq(clean_dm, data)[0]  # first arg is the model params

Note data above is just the phase_data, and if we use mulitlooking, the effort here is reduced, after spending the effort to multilook the (already multilooked prepifg output).

For very large ifgs it may make sense to do so.

The network method on the other hand has more reason to use multilooking as we need all interferrograms in memory, and that step is not parallel.

Have a rethink re the filenames based on my previous comments.

chandra2ga commented 4 years ago

out/gamma/out/ref_pixel150.941666654-34.218333314_5_5_5_0.8_file.npy

@basaks: Yeah, I noticed this file is created with default pixel values on config file. I might missed that.

  1. The whole reason you want the parameter names in the filename is that you don't overwrite that file when parameters change on the next run, and can reuse that file if it's available on disc. In this situation if you provided (-1, -1) first, then changed to a (lat, lon), and then went back to (-1, -1), the saved refpixel file will not be overwritten and reused when you go back to the old param. You suggestion will overwrite file, and in this example, on the 3rd job, refpixel will have to be recomputed. ref_pixel.npy does not work and additional effort was spent to make this change.

As as per this program when we have default ref pixel location on config file a new refpixel file will create if not already generated on disc and use that file for next processing. In case of automatic pixel lat/lon selection it will produce a new file name ref-1-1.npy and if that file already present on disc, program will reuse that file for next processing step without further overwriting or recreating. But as per our new workflow, reference pixel step will work after performing all the correction steps, so in that case selection of automatic reference pixel might vary depending on additional correction steps included to the workflow. And if the file will not overwrite to the exiting ref-1_-1.npy during automatic pixel selection, it might choose an incorrect pixel location reading from the old existing file.

In any case (-1,-1) on file not much convincing instead could be replaced by lat/lon as created with default one. This will discard any issue of overwriting then.

basaks commented 4 years ago

@chandra2ga , That does not work as when the reference pixel file is not deterministic at the beginning of the model (i,e., config file), you cannot reuse it. The reference pixel location will vary with the steps and sequence of corrections (which are interdependent). So we cannot adopt such a strategy. In this case we have to drop the reuse of reference pixels idea.