Concatenating chunks and tracks suggestions

falkamelung commented 3 years ago

The chunks are generated by minsar using a script generate_chunk_template_files.py $TE/MakranBigSenDT166.template --latStep 1.0 latMargin 0.1. Currently we do loop_tracks.py --project MakranChunk --tracks SenDT166 --chunk. We should aim to run a concatenation script at the end of the chunk processing that allow to concatenate all existing of selected chunks.

concatenate_chunks.py $TE/MakranBigSenDT166.template  --outdir mintpy
concatenate_chunks.py --chunks MakranChunk2*SenDT166 --project MakranBigSenDT166 --outdir mimtpy
concatenate_chunks.py --chunks MakranChunk25SenDT166  MakranChunk26SenDT166 --project MakranBigSenDT166 --outdir mimtpy

After the processing we should be able to generate the plots using

smallbaselineApp.py $TE/MakranBigSenDT166.template --plot

For concatenating tracks we could name it

concatenating_tracks.py MakranChunk*SenDT

the --outdir default would be MakranSenDT

(I am not sure now when I am writing this how project or project_name is used if no *template file is given. I think it looks for a /mintpy/smallbaselineApp.cfg but I forgot. ) ( I am not sure about the positional arguments: We could alternatively have all the files that we want to concatenate as positional argument: concatenate_chunks.py MakranChunk*SenDT166 or concatenate_chunks.py MakranChunk2{5,6}SenDT166)

ranneylxr commented 3 years ago

Hi Professor， I upload these two script to the repo. Both of them work well using KokoxiliChunks as example. Now these two scripts using S1.he5 data and generate the velocity concatenating results. The scripts concatenating all S1.he5 will be updated several days later. The parameters of template for concatenate_chunks.py is

## processing template default setting
################################## concatenate chunks Parameters ####################################
mimtpy.chunk                             = yes # yes or no
mimtpy.chunk.chunks                      = KokoxiliChunk*SenAT12 # MakranChunk*SenDT166 or MakranChunk2{5,6}SenDT166
mimtpy.chunk.project                     = KokoxiliBigSenAT12

Xiaoran.

falkamelung commented 3 years ago

Great, thank you! Will try later. Something I see is that we may want to have a MimtPy/docs/requirements.txt for the conda installation as we have for the other packages. For now it seems to need rasterio and geopandas.

falkamelung commented 2 years ago

Here what I think concatenate_chunks.py should do. The goal is that I can add it to the workflow and you don't have to deal with the chunks at all.

We may want to have a naming convention more clear than 'Big'. I could use KokoxiliToBeForChunksSenAT12.template. We could give this as mimtpy.chunk.identifiers = ForChunks,Chunk.

This means that the auto settings should be:

mimtpy.chunk             = auto       #  [yes / no], auto for yes, concatenate chunks
mimtpy.chunk.identifiers = auto       #  [string,string], auto for ForChunks,Chunk,   strings used  for naming convention of control files
mimtpy.chunk.chunks      = auto       # KokoxiliChunk*SenAT12  for mimtpy.chunk.identifier=auto
mimtpy.chunk.project     = auto       # KokoxiliToBeChunkedSenAT12  for mimtpy.chunk.identifier=auto

(Do we need chunk.chunks and chunk.project if we have chunk.identifiers? It is not bad to be a bit redundant as this is self-explanatory.)

We may want to have the option to use both positional arguments (S1* files) and the --chunks option:

concatenate_chunks.py GujaratChunk21SenDT107/mintpy/S1_IW123_107_0520_0525_20160926_XXXXXXXX.he5  GujaratChunk22SenDT107/mintpy/S1_IW123_107_0517_0522_20160926_XXXXXXXX.he5 --project GujaratBigSenDT107 --datatype velocity --outdir mimtpy
concatenate_chunks.py --chunks GujaratChunk2{1,2}SenDT107 --project GujaratBigSenDT107 --datatype velocity --outdir mimtpy

Actually, do we really need to say '--dataset velocity' and '--dataset timeseries' ? We could also have '--dataset all' and put this as default. We may want to have one wrapper function that uses info form 'mimtpyApp.cfg' (concatenate_chunks.py) which calls a barbone functions for concatenation of individual datasets (e.g. 'concatenate_chunks_barebone.py' with S1 files as positional arguments. With proper defaults we might be able to get concatenate_chunks.py $TE/KokoxiliBigSenAT12.template (without any other option) to produce an S1 file.
Can we make 'smallbaselineApp.py --plot' work on concatenated files to create all plots. Maybe we should do it in stages. We first have a bash script (similar as https://github.com/insarlab/MintPy/blob/v1.3.0/mintpy/sh/plot_smallbaselineApp.sh ).
Does concatenate_chunks.py also work for radar coordinates? What about velocity standard deviation?
Do be consistent with MintPy and MinoPy shall we use '--dir' instead of '--outdir' ? Also, if '--dir' is not given the default should be 'mimtpy' (as is mintpy for MintPy). We also could use velocity as default for '--datatypye'.
Shall we add options '--startDate', '--endDate' to calculate velocities for different time periods?
At the end display which dates have been removed. (Maybe consider a '--dryrun' option that would just return this without creating any files)

3.x I see that concatenate_chunks.py calls 'track_offset.py'. That sounds like it calculates offsets between tracks, which it does not do. Any idea about better names. Maybe 'chunk_offset.py' is more appropriate?

3a. --help should also say something about the files produced. Is velocity_track.h5 the final product? 3b. Are there any differences to the MintPy names?
3c. files are written into mintpy/velocity folder. Did you consider to write into mimtpy folder? Would that make sense? 3d . The S1* file does not seem to have velocity. So it calculates it using MimtPy functions? (say that in --help) 3e. Say in --help what features (e.g. velocity standard deviation) 3f. in --help talk about reference point 3g: Add to --help examples

concatenate_chunks.py --chunks $SCRATCHDIR/ChamanChunk*SenDT78 --project $SCRATCHDIR/ChamanBigSenDT78 --datatype timeseries --outdir mimtpy

The defaults should be called following MintPy mimtpyApp.cfg and mimtpyApp_auto.cfg
At the end it could plot the appropriate view.py call for display
mimtpy repo should be called MimtPy to be consistent with MintPy and MinoPy
I don't know how much time you have the following week. We need the zenodo file for the paper. Which files are you putting? The concatenated S1* files? Then lets make sure that concatenate_chunks.py if final enough to produce them. Another option would be to have a final concatenate_chunks.py, make a MimtPy release and add how to get from the Chunks to the concatenated one to the zenodo file. When somebody asks we refer them to this file.

Hmmm.... You probably don't have time for this right now.... (but we certainly should do this for the Tibet paper). Maybe we can add, dependent or independent from the paper, an example to MimtPy as Sara did for MinoPy (and we have in MintPy)? It would be something like

wget https://zenodo.org/record/6039250/files/PichinchaSenDT142.zip
unzip PichinchaSenDT142.zip
cd PichinchaSenDT142
minopyApp.py PichinchaSenDT142.txt --dir ./minopy

Minors:

Use something else then difference between master and slave overlay

Don't display too many blank lines. At the end it says:


Go to project dir: /scratch/05861/tg851601/ChamanChunk23SenAT144/mintpy

Delete velocity folder from /scratch/05861/tg851601/ChamanChunk23SenAT144/mintpy

Go to project dir: /scratch/05861/tg851601/ChamanChunk24SenAT144/mintpy

Delete velocity folder from /scratch/05861/tg851601/ChamanChunk24SenAT144/mintpy

Go to project dir: /scratch/05861/tg851601/ChamanChunk25SenAT144/mintpy

Delete velocity folder from /scratch/05861/tg851601/ChamanChunk25SenAT144/mintpy

Go to project dir: /scratch/05861/tg851601/ChamanChunk26SenAT144/mintpy

Delete velocity folder from /scratch/05861/tg851601/ChamanChunk26SenAT144/mintpy

Go to project dir: /scratch/05861/tg851601/ChamanChunk27SenAT144/mintpy

Delete velocity folder from /scratch/05861/tg851601/ChamanChunk27SenAT144/mintpy

Go to project dir: /scratch/05861/tg851601/ChamanChunk28SenAT144/mintpy

Delete velocity folder from /scratch/05861/tg851601/ChamanChunk28SenAT144/mintpy

Go to project dir: /scratch/05861/tg851601/ChamanChunk29SenAT144/mintpy

Delete velocity folder from /scratch/05861/tg851601/ChamanChunk29SenAT144/mintpy


There are also white lines earlier. Could we have white lines just when we go from one chunk to the next?

- Preferably it says at the end which files got created. But not sure this is possible in this case because of the deleting.
- do we need to say 'with w mode'  and 'with compression=None' ?

falkamelung commented 2 years ago

Other items.

'mimtpy.velcumu': can we use a more self-explanatory name? I have no idea what it might stand for....
in mimtpyApp.py I see 'DataSet', 'Dataset', 'dataSet', 'dataset'..... These are all different? I think our convention would be 'dataSet'.... and possibly 'dataset'
mu.seperate_str_byspace. --> mu.separate_string_by_space. (not spelling error)
mu.seprate_filename_extension. (also spelling error)
mu.velo_disp use longer self-explanatory name
delete_tmpgeo. --> something more self-explanatory, longer

Do you have MintPy functions copied? That is a bit dangerous but you know this of course.

falkamelung commented 2 years ago

Items on 'concatenate_tracks.py'

search_tracks: why don't you use glob? use other variable name then 'L'
'concatenation_tracks` within 'concatenatetracks.py' is not good.... We have a convention for this. Something with 'run'

falkamelung commented 2 years ago

(from another issue which is closed now)

I see that you use $SCRATCHDIR in the script. Lets avoid this. We should first have a version where all files are given with full path. Then, if —project is given it calls a module that sets the paths and does globs to find the S1 file

Currently you would call it being in $SRATCHDIR. That is different to all our scripts.So the ‘normal way’ would be

cd $SRATCHDIR/GujaratBigSenDT10
concatenate_chunks.py —chunks $SCRATCHDIR/GujaratChunk2{1,2,3}SenDT107  --datatype timeseries                                         --outdir mimtpy
concatenate_chunks.py —chunks $SCRATCHDIR/GujaratChunk2{1,2,3}SenDT107  --datatype timeseries -—project GujaratBigSenDT10             --outdir mimtpy
concatenate_chunks.py —chunks $SCRATCHDIR/GujaratChunk2{1,2,3}SenDT107  --datatype timeseries -—project $SCRATCHDIR/GujaratBigSenDT10 --outdir mimtpy
concatenate_chunks.py —chunks ../GujaratChunk2{1,2,3}SenDT107           --datatype timeseries                                         --outdir mimtpy

For convenience you could also call it::

cd $SRATCHDIR
concatenate_chunks.py —chunks GujaratChunk2{1,2,3}SenDT107              --datatype timeseries -—project GujaratBigSenDT10 --outdir mimtpy
concatenate_chunks.py —chunks GujaratChunk2{1,2,3}SenDT107              --datatype timeseries -—project $SCRATCHDIR/GujaratBigSenDT10 --outdir mimtpy
concatenate_chunks.py —chunks $SCRATCHDIR/GujaratChunk2{1,2,3}SenDT107  --datatype timeseries -—project SCRATCHDIR/GujaratBigSenDT10  --outdir mimtpy

falkamelung commented 2 years ago

Hi @ranneylxr : Here are two examples of problems that can occur with the current globbing in concatenate_tracks.py. Essentially, if there are some leftover files from previous tries it uses them as well. In the last line it is using $SCRATCHDIR/$SCRATCHDIR which is not right. (FA: note later: I solved this issue, so it is not urgent)

concatenate_tracks.py ChamanBig*SenDT --datatype velocity
/work2/05861/tg851601/stampede2/codet/rsmas_insar/sources/MimtPy

the output dir for concatenation is /scratch1/05861/tg851601/ChamanBigSen*SenDT/mimtpy/velocity.

[5] > /work2/05861/tg851601/stampede2/codet/rsmas_insar/sources/MimtPy/mimtpy/concatenate_tracks.py(51)search_tracks()
-> for dir in dirs:
(Pdb++) l
 46         return inps
 47     
 48     def search_tracks(file_dir, track, datatype):
 49         L=[]
 50         for root, dirs, files in os.walk(file_dir):
 51  ->         for dir in dirs:
 52                 if dir.find(track.replace('*','')) != -1:
 53                     if len(re.findall(r'[\d.]+', dir)) != 0:
 54 B                       L.append(os.path.join(root, dir, 'mimtpy', datatype))
 55 B       return L
 56     
(Pdb++) n
[5] > /work2/05861/tg851601/stampede2/codet/rsmas_insar/sources/MimtPy/mimtpy/concatenate_tracks.py(52)search_tracks()
-> if dir.find(track.replace('*','')) != -1:
(Pdb++) c
[5] > /work2/05861/tg851601/stampede2/codet/rsmas_insar/sources/MimtPy/mimtpy/concatenate_tracks.py(54)search_tracks()
-> L.append(os.path.join(root, dir, 'mimtpy', datatype))
(Pdb++) n
[5] > /work2/05861/tg851601/stampede2/codet/rsmas_insar/sources/MimtPy/mimtpy/concatenate_tracks.py(51)search_tracks()
-> for dir in dirs:
(Pdb++) L
['/scratch1/05861/tg851601/ChamanBigSenDT78/mimtpy/velocity', '/scratch1/05861/tg851601/ChamanBigSenDT151/mimtpy/velocity', '/scratch1/05861/tg851601/scratch1/05861/tg851601/ChamanBigSenDT78/mimtpy/velocity']

Other concatenate_tracks.py suggestions:

The directory created in the case above should be ChamanBigSenDT and not ChamanSenDT. After all, we concatenate tracks located in an *Big* directory

ranneylxr commented 2 years ago

Hi @falkamelung, All comments are solved besides the followings:

We may want to have a naming convention more clear than 'Big';
We may want to have one wrapper function that uses info form 'mimtpyApp.cfg' and The defaults should be called following MintPy mimtpyApp.cfgand mimtpyApp_auto.cfg Answer: I need some time to figure out what it is and how it works...
Can we make 'smallbaselineApp.py --plot' work on concatenated files to create all plots. Answer: now the script use view.py to plot. Does this comment works? Answers for the discussion items:
Does concatenate_chunks.py also work for radar coordinates? What about velocity standard deviation? Answer: The concatenate_chunks.py only works for the geometry coordinates and not calculate the velocity stardart deviation. Because people can use the geocode.py andtimeseries2velocity.py of Mintpy to achieve them based on the concatenated results.
Shall we add options '--startDate', '--endDate' to calculate velocities for different time periods? Answer: I don’ t think so, User can use timeseries2velocity.py of Mintpy and concatenated timeseries data to calculate the velocity for the given time period. 3.in mimtpyApp.py I see 'DataSet', 'Dataset', 'dataSet', 'dataset'..... These are all different? I think our convention would be 'dataSet'.... and possibly 'dataset' Answer: Replacing DataSet with datasets

geodesymiami / mimtpy

Concatenating chunks and tracks suggestions #11

Other concatenate_tracks.py suggestions: