GeoscienceAustralia / ga_sar_workflow

InSAR processing workflow used by Geoscience Australia
Apache License 2.0
3 stars 1 forks source link

Running the workflow with test data discussion #162

Closed truth-quark closed 3 years ago

truth-quark commented 3 years ago

AD: The caveat here is that there is a lot of code to go through and I may have missed something, so we might have @tfuhrmann glance at the script as well when he gets back.

A thought: what if Thomas inspected the code in conjunction with using Mitchell's new test framework as part of test/debug cycles?

This could potentially also be done with the 1st attempts at running the coreg code on gadi too.

mwheeler commented 3 years ago

I think the unit testing and validation testing are completely separate topics/discussions.

The 'test framework' (PyGammaTestProxy object i made) is for unit testing only, not validation / not for running against real data. I don't mind when anyone decides to review the code, be it part of an existing PR or whenever they want at their own will.

As for validation... We definitely need a little dataset (not sure on the terminology here, a scene or a frame or a tile?) where someone has shown how to run the bash against it, and a copy of the outputs expected from them (eg: from the correctly run bash) - for others to validate against. From this we can build validation tests.

adeane-ga commented 3 years ago

Yeah I agree. From my minimal experience with PR reviews, The amount of time I spend on them, I reckon I could be running both a BASH and Python workflow on a small data set. This would allow me to give a much more robust review (and pick up obvious things more quickly). I'm happy to start putting together the Bash version of this test data, but I need to make sure it is covering the same spatial/temporal extent as what we can run in Python.

I originally thought the data could be made up of just two Sentinel-1 scenes (because that's all we need to form an inteferogram ), but for testing some of the network generation code and code for using different masters for co-registration etc., it might be good to have 3 or more.

As far as selecting which frame to process, I'm not sure how limited we are by the framing definition, i.e. do we need to select a dataset that is from the framing definition? If not, then we can reduce the scene to a minimum of 6 bursts to reduce size (to decrease testing time) - just an idea, but full frame is better then no frame.

mwheeler commented 3 years ago

I originally thought the data could be made up of just two Sentinel-1 scenes (because that's all we need to form an inteferogram ), but for testing some of the network generation code and code for using different masters for co-registration etc., it might be good to have 3 or more.

I haven't gotten as far as seeing the network/tree related code yet (has that been ported to python yet? if not, where's the bash?)

I don't think processing time will matter too much, as long as the number of scenes is the minimum needed (eg: 3+ for network stuff) - keeping the scenes "as is" so they represent what they would be in production seems safer to me so there's no differences between validation tests and production... they wouldn't be run regularly, so reducing testing time i think would be secondary to making sure we're reproducing something true to what would happen in production (as identical as possible)

adeane-ga commented 3 years ago

I originally thought the data could be made up of just two Sentinel-1 scenes (because that's all we need to form an inteferogram ), but for testing some of the network generation code and code for using different masters for co-registration etc., it might be good to have 3 or more.

I haven't gotten as far as seeing the network/tree related code yet (has that been ported to python yet? if not, where's the bash?)

I don't think processing time will matter too much, as long as the number of scenes is the minimum needed (eg: 3+ for network stuff) - keeping the scenes "as is" so they represent what they would be in production seems safer to me so there's no differences between validation tests and production... they wouldn't be run regularly, so reducing testing time i think would be secondary to making sure we're reproducing something true to what would happen in production (as identical as possible)

I'm not sure if it has been ported yet (I think Thomas may have chatted to Ben about it, but I can't remember). For some irony, the Bash version for this is written in Python :)

Here: https://github.com/GeoscienceAustralia/gamma_insar/blob/develop/calc_baselines.py and here: https://github.com/GeoscienceAustralia/gamma_insar/blob/develop/calc_baselines_functions.py

And yeah I see your point, regarding the scene size. Keeping them as full frame ("as is") defined by the framing definition makes sense to me.

lanweiwang commented 3 years ago

Hi, regarding testing, I do have a regular test set (3 acquisition date, 20180103, 20180115 and 20180127) and last Friday we did the first test on the new S1 framing and it's working OK (generatet sigma0 and gamma0): /g/data/up71/projects/InSAR-ARD/reference_example/lanwei_testshp/T147D_F26S.shp (this one showed initial offset error, we'll use this one to test Land centre coordinates for initial offset) /g/data/up71/projects/InSAR-ARD/reference_example/lanwei_testshp/T147D_F28S.shp

But, we do not have full bash script run result for T147D_F28S.shp.

adeane-ga commented 3 years ago

Hi, regarding testing, I do have a regular test set (3 acquisition date, 20180103, 20180115 and 20180127) and last Friday we did the first test on the new S1 framing and it's working OK (generatet sigma0 and gamma0): /g/data/up71/projects/InSAR-ARD/reference_example/lanwei_testshp/T147D_F26S.shp (this one showed initial offset error, we'll use this one to test Land centre coordinates for initial offset) /g/data/up71/projects/InSAR-ARD/reference_example/lanwei_testshp/T147D_F28S.shp

But, we do not have full bash script run result for T147D_F28S.shp.

Great - well those three scenes could be a good starting point.

Does anyone have any objections to me trying to generate the Bash version of this? Or are there reasons why this might not work? (@sixy6e, @mcgarth).

If I am to generate the Bash version of this, then I need to replicate the custom frame definition for Bash from the ESA defined frames. I can't actually access /g/data/up71/ to have a look, would it be possible to place the shape file on dg9 somewhere? (or relevant information on what that frame was based on).

Though probably hold off until others have commented, in case its a bad idea, could always have a face-to-face if we need to run through details.

mcgarth commented 3 years ago

Does anyone have any objections to me trying to generate the Bash version of this? Or are there reasons why this might not work? (@sixy6e, @mcgarth).

No go ahead, sounds reasonable to me. Only thing to think about: is a three image network adequate to fully test the secondary co-registration...?

I haven't gotten as far as seeing the network/tree related code yet (has that been ported to python yet? if not, where's the bash?)

@truth-quark @mwheeler @sixy6e please prioritise this, if it isn't already done. We would like @tfuhrmann to review it before he leaves us in a couple of weeks.

lanweiwang commented 3 years ago

Sorry for the late reply, we were in a long 2 hrs meeting! I tried to copy the file to /g/data/dg9/tmp/ without success (permission denied) so I copied to /g/data/dg9/INSAR_ANALYSIS/temp/test_S1_frame/

adeane-ga commented 3 years ago

Sorry for the late reply, we were in a long 2 hrs meeting! I tried to copy the file to /g/data/dg9/tmp/ without success (permission denied) so I copied to /g/data/dg9/INSAR_ANALYSIS/temp/test_S1_frame/

No stress - thanks for that Lan-wei, I see the shape file. I'll see what I can do with this and report back.

tfuhrmann commented 3 years ago

.. We definitely need a little dataset (not sure on the terminology here, a scene or a frame or a tile?) where someone has shown how to run the bash against it, and a copy of the out

100% agree. A comparison and validation of the py-gamma output to the bash workflow output for a small test data set is needed to go forward. I understand that @adeane-ga is working on that at the moment by processing the test data set proposed by Lan-Wei with the GAMMA bash workflow (i.e. Sentinel-1 data consisting of 12 bursts at the NSW/QLD border).

adeane-ga commented 3 years ago
mwheeler commented 3 years ago

I've also processed the same data w/ the python workflow here:

This is a preliminary result, there appear to be differences that we'll probably need to get people better informed than myself to help identify (I'm not sure how to interpret the results myself yet):

  1. With multilook=1, everything works fine but the data appears a bit different (I assume this is expected if the multilook value differs?)
  2. With multilook=2, the python workflow fails in one of the slave coregistrations (mcf fails with Error: Input must have at least three input vertices.)

The python workflow logs every single gamma command and it's parameters in a json file, I wasn't able to find something similar in the bash workflow (there is a commands.log however it doesn't seem complete / only shows a subset of commands).

I imagine having a complete trace of the gamma commands run on both workflows, it'd be much easier to narrow down where they start to diverge and why.

adeane-ga commented 3 years ago

Regarding the question of Py-GAMMA and Bash-GAMMA test data workflows using identical configuration, I have copied an updated version of same BASH processing workflow document for the current test data below.

Basically, the only new addition is the exact path to which GAMMA version we are using, along with a directory to our GAMMA_CONFIG file.

The path to GAMMA version binaries is here: /g/data/dg9/SOFTWARE/dg9-apps/modules/geodesy/GAMMA-20191203 and contains these paths:

append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/DIFF/bin append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/DISP/bin append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/ISP/bin append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/LAT/bin append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/MSP/bin

append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/DIFF/scripts append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/DISP/scripts append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/ISP/scripts append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/LAT/scripts append-path PATH /g/data/dg9/SOFTWARE/dg9-apps/GAMMA/GAMMA_SOFTWARE-20191203/MSP/scripts

workflow-notes-test-data-insar-ard.docx

mwheeler commented 3 years ago

Closing this ticket as we've been running the workflow on gadi for many months now, for a wide range of test data.