basaks commented 4 years ago

all process/correct steps except maxvar and vcmt calcs are saved on disc and reused
process renamed to correct, all references and tests updated. correct does not compute timeseries or stacking
new timeseries and stack commands.
max var mpi code simplified.
more mpi parallelisation in spatial and temporal filtering during aps correction. APS correction can be many X faster based on CPUs thrown at it. May be even >10x based on num of cpus.
Test suite extended to cover each of these steps (we now have a test_aps.py too).
(MG) Processing of coherence files by conv2tif and prepifg is now triggered by the presence of cohfilelist in the config file. If the list is present, multilooked/cropped coherence files are saved to disk, regardless of whether coherence masking is on or off (parameter cohmask in config file).
(MG) Deprecate redundant tscal config parameter, that was used previously to control time series execution. This is not needed anymore now there is a timeseries step invokable on the command line.
add unit test coverage for refpixel lat/lon to x/y conversion.
A common framework for using tiles for both mpi and multiprocessing methods in stack, timeseries and mst algorithms.

codecov-commenter commented 4 years ago

Codecov Report

Merging #288 into develop will decrease coverage by 1.33%. The diff coverage is 65.69%.

@@             Coverage Diff             @@
##           develop     #288      +/-   ##
===========================================
- Coverage    84.50%   83.16%   -1.34%     
===========================================
  Files           26       26              
  Lines         3413     3498      +85     
  Branches       534      547      +13     
===========================================
+ Hits          2884     2909      +25     
- Misses         429      491      +62     
+ Partials       100       98       -2

Impacted Files	Coverage Δ
pyrate/default_parameters.py	`100.00% <ø> (ø)`
pyrate/merge.py	`15.25% <9.52%> (-0.14%)`	:arrow_down:
pyrate/main.py	`23.25% <20.00%> (+23.25%)`	:arrow_up:
pyrate/core/logger.py	`36.36% <33.33%> (+1.06%)`	:arrow_up:
pyrate/core/stack.py	`75.00% <40.00%> (-22.81%)`	:arrow_down:
pyrate/prepifg.py	`50.78% <40.00%> (-0.81%)`	:arrow_down:
pyrate/core/timeseries.py	`77.24% <43.18%> (-13.49%)`	:arrow_down:
pyrate/conv2tif.py	`94.44% <66.66%> (ø)`
pyrate/core/shared.py	`93.88% <76.47%> (-0.43%)`	:arrow_down:
pyrate/configuration.py	`91.87% <78.94%> (-4.25%)`	:arrow_down:
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4b8ae96...986e594. Read the comment docs.

basaks commented 4 years ago

At the user level, I am happy with the changes in this PR and it implements the user functionality requested by the team. I have pushed a few commits on top of the branch that make some minor house-keeping changes. The biggest change in these commits is deprecating the tscal variable, which this PR makes redundant.

I can see this is done.

I did some numeric comparison of this PR against develop, by comparing summary statistics of output geotiffs (gdalinfo -stats) after running the full workflow against the unit test data. The results are documented in this word doc. In summary, there are differences in the mean and std-dev statistics. The worst is a difference in the 6th decimal place (of a millimetre), which I think is acceptable given that the measurement precision of InSAR is no where near 0.000001 millimetres! However, an investigation in to why these differences are occurring for this PR is recommended please @basaks .

These minor differences are introduced due to us splitting the correct and timeseries and stack steps. At the end of the correct step we save the params on disc, and reuse the saved params during timeseries and stack. This introduces some floating point errors, changing the outputs slightly.

I also ran a benchmark to test the improvements made by parallelising the spatial and temporal filters. For this benchmark on gadi I used 153 Mexico ifgs, 128Gb RAM and mpirun -n 16 pyrate workflow .... With apsest=0, runtimes were 5m 33s and 5m 24s for develop and sb/step4-refactor-process respectively. With apsest=1, runtimes were 9m 40s and 8m 14s for develop and sb/step4-refactor-process respectively. This represents an improvement in efficiency of ~15%. Well done @basaks!

Allow me to elaborate on this. Using 16 cpus, the improvement in the APS correction step alone is from (9.40-5.33)m = 247s to (8.14-5.24)m = 170s, thus making an improvement of 247s/170s ~ 45%.

There is more scope to improve by using numpy arrays instead of dicts to gather outputs of each row.

Minor comments:

when running using MPI, every process writes every log message. Is there a way to suppress this, so that only the control process writes log messages?

The MPI logging used to log only the master process in 2017, and it was later changed to log messages from all processes. I will add that change back in for the on screen log. Is it desirable to log every process log in the file logger?

It is not obvious to the user that VCMT is saved to disk (it is buried in correction.params). Suggest saving the VCMT to a numpy array file vcmt.npy

Done!

All other issues raised addressed!

mcgarth commented 4 years ago

The additional changes by @basaks have addressed all the issue I raised! I re-did my numeric comparison tests for one run (apsest=1, cohmask=1) - the results were identical. So saving VCMT to disk as vcmt.npy indeed did not change a thing as suspected by @basaks. I am happy with the explanation for the changes at 6 d.p. level given by @basaks above, and accept them. They are only fractions of a millimetre in magnitude and won't trouble us

GeoscienceAustralia / PyRate

Sb/step4 refactor process #288

Codecov Report