geodesymiami / insarmaps

3 stars 0 forks source link

Ingest of high-resolution data (old issue) #58

Closed falkamelung closed 2 years ago

falkamelung commented 2 years ago

All on jetstream:

/data/HDF5EOS/Miami2SenAT48/minopy_Surfside_2016-2022/single_reference_network/S1_IW3_048_0081_0082_20160412_XXXXXXXX_PS.he5
``

The best one (but possibly to large)

/data/HDF5EOS/Miami2SenAT48/minopy_SM[1020] ll -rw-rw-r--. 1 centos centos 364742393 Apr 19 18:27 S1_IW3_048_0081_0082_20150921_20211112_PS.he5 -rw-rw-r--. 1 centos centos 364802200 Apr 19 18:20 S1_IW3_048_0081_0082_20150921_20211112.he5

falkamelung commented 2 years ago

For jetstream

ssh -YC -o ServerAliveInterval=60  centos@129.114.104.223

Does not work for you? Then you have to run s.bop to set the environment.

I thought we tried last summer and it worked? I added you key at the time. Or did you change computer? Else send me your key and I will add it.

falkamelung commented 2 years ago

Here is a kmz file: vel_ds_hotel.kmz.zip For me it looks like:

image

The S1-file is on jetstream at: /data/HDF5EOS/Miami2SenAT48/minopy_SM/S1_IW3_048_0081_0082_20150921_20211112_PS.he5

It covers a larger area and the kmz file above has been extracted. It currently displays as: https://insarmaps.miami.edu/start/25.8776/-80.1206/17.5000?flyToDatasetCenter=false&startDataset=S1_IW3_048_0081_0082_20150921_20211112_PS&pointLat=25.87703&pointLon=-80.12165&minScale=0.7&maxScale=-0.7&startDate=20150921&endDate=20211112

image
stackTom commented 2 years ago

There is definitely a discrepancy. I am now utilizing the get_lat_long from utils in Mintpy to get the lat and long of the datasets. I re ingested the above dataset, but still have a large discrepancy. Is there any way you can show me how to create a kmz file. I need to debug to see where MintPy and insarmaps are diverging.

stackTom commented 2 years ago

It appears the get_lat_lon function needs a geometry file in order to read high resolution/non-grid data (see below): image

If we don't have this geometry file, then it resorts to grid mode (which explains why the dataset looks the same now that I am using the get_lat_lon function as before when I was calculating the grid by hand). How can we obtain such a geometry file?

stackTom commented 2 years ago

You mentioned in another issue that the locations are in the timeseries. Is it safe to say that only high res timeseries will have the lat and longitude in the timeseries? In that case, I can just tell the ingest script to use these as the coordinates if an h5 file has them, and use regular grid mode for files which don't have this info in the h5 file.

falkamelung commented 2 years ago

I am trying to a good example together so that we can see the difference between the kmz and the ingested one. Hopefully later today.

The lat/long is available in the S1* file (same as in the geometry file). I displayed the image below with view.py S1_IW1_128_0596_0597_20160711_XXXXXXXX.he5 latitude

image

The lat/long is never a uniform grid (as it depends on the topography) but in most cases it is almost uniform. Only for the urban area of Miami with high buildings you really feel the non-uniformity.

So it looks to me that we always should use the exact lat/long ??? It is not associated with any performance loss, right?

Why didn't we use it from the beginning? Possibly that it was not in the S1 file at the time. Furthermore I assumed hat a uniform grid is much simpler to deal with and there were other pressing isseued

stackTom commented 2 years ago

I am trying to a good example together so that we can see the difference between the kmz and the ingested one. Hopefully later today.

The lat/long is available in the S1* file (same as in the geometry file). I displayed the image below with view.py S1_IW1_128_0596_0597_20160711_XXXXXXXX.he5 latitude

image

The lat/long is never a uniform grid (as it depends on the topography) but in most cases it is almost uniform. Only for the urban area of Miami with high buildings you really feel the non-uniformity.

So it looks to me that we always should use the exact lat/long ??? It is not associated with any performance loss, right?

Why didn't we use it from the beginning? Possibly that it was not in the S1 file at the time. Furthermore I assumed hat a uniform grid is much simpler to deal with and there were other pressing isseued

No performance issue. Just the way we programmed it. Not sure if we had the lat and long in the h5 files before.

Were you able to create an example I can play around with? I am having trouble understanding how the lat and long is stored. I see a lot of NANs... if you have a document describing the layout, would be great.

stackTom commented 2 years ago

I figured out how to read the lat and long from /timeseries/geometry/latitude. However, we still don't get high density... In order to get high density, I run the script with the mask set to off... see here: http://129.213.120.104/start/25.8776/-80.1206/17.5000?flyToDatasetCenter=false&startDataset=S1_IW3_048_0081_0082_20150921_20211112_PS&pointLat=25.87690&pointLon=-80.12128&startDate=20150921&endDate=20211112

This is why I'd like to have an example kmz created from start to finish, so I can see where the processes differ. I think something is going on when the kmz is created (maybe different mask used etc).

falkamelung commented 2 years ago

Hi @mirzaees Could you put your geo and inputs folder in /data/HDF5EOS/Miami2SenAT48/minopy_SM on jetstream? We are trying to produce a dataset that allows us to create a geo_timeseries_demErr.kmz as well as an S1* file for ingest to reproduce the difference between the kmz and insarmaps display.

I put data into /data/HDF5EOS/Miami2SenAT48/minopy/single_reference_network. Tese commands

save_hdfeos5.py geo/geo_timeseries_demErr.h5 --tc geo/geo_temporalCoherence.h5 --asc  geo/geo_avgSpatialCoh.h5 -m  geo/geo_maskPS.h5 -g  geo/geo_geometryRadar.h5  -t smallbaselineApp.cfg --suffix PS --update
save_kmz_timeseries.py geo/geo_timeseries_demErr.h5 --vel geo/geo_velocity.h5 --tcoh geo/geo_temporalCoherence.h5 --mask geo/geo_maskTempCoh.h5 

create a geo_timeseries_demErr.kmz. However, the kmz does not display at high zoom level in GoogleEarth. I have not figured out why. There could be an option to save_kmz_timeseries.py regarding level-of-detail but I did not understand yet. So I am not sure whether we can use this dataset to address the problem of insarmaps.

mirzaees commented 2 years ago

Hi @falkamelung I put the geo and inputs on jetstream, the geolocation corrected geometry is in corrected_geometry folder

for the kmz files, I used following command in radar coordinate:

save_kmz.py velocity.h5 -g corrected_geometry/geometryRadar.h5 --step 1

creating kmz and everything from geo folder is not same as from radar coordinate avoid geocoding if possible and you will get the same kmz files as I had however, it will be a high density of scatterers and you need to use subsets

falkamelung commented 2 years ago

I see. So we can't geocode and for insarmaps ingest we should not read the S1 file but the timeseries file:

hdfeos5_2json_mbtiles.py  timeseries_demErr.h5 --mask  maskPS.h5 -g inputs/geometryRadar.h5

@mirzaees , do you think that would work.?

@stackTom , the files are all in /data/HDF5EOS/Miami2SenAT48/minopy/single_reference_network[ if you want to give it a shot.

stackTom commented 2 years ago

The current script relies on the name of the S1 file to get a lot of information such as track, satellite etc. Where would I get this information from if we don't use the S1 file anymore?

stackTom commented 2 years ago

Also I'm not entirely sure what geocoding is nor why these files are different from the S1 file. I thought the S1 file is the culmination of processing all these files and had all the data from these other files in it. Can't we just modify the data in the S1 file and continue to use the S1 file?

falkamelung commented 2 years ago

There is a possibility that creating an S1 file with data in radar coordinates would work. I never tried. We would have to give timeseries instead of geo_timseries, etc

I am on travel since nice this morning (North Carolina) until Thursday. so I can’t try, although I may get a chance one of those days. Maybe @mirzaees can try but I know that she is very busy.

If you can’t get it to work I would suggest to just use dummy names for now or give them on the command line and I fix things later (the names are done in save_hdf5eos.py). Much of the names (frame numbers) don’t make much sense anyway at high resolution and we may want to use names (Brickell, surfside, etc)

radar coordinates is the satellite geometry, whereas Geocoded is on a uniform lat/long grids

stackTom commented 2 years ago

I think the easiest solution is to create s1 files with the radar coordinates, and then I just read the coordinates for each s1 file. That way, s1 files which are radar coded will automatically upload as radar coded, and those that are geocoded will still upload as geocoded, without me having to implement different radar coded/geocoded modes.

I can, of course, simply modify the ingest script to add separate support for geocoded (current support) and radar coded (will need to add) modes, but it seems to me that simply adding the coordinates to the s1 file under the "latitude" and "longitude" fields in the h5 file is the cleanest and simplest solution.

falkamelung commented 2 years ago

I uploaded the entire directory and

save_hdfeos5.py timeseries_demErr.h5 --tc temporalCoherence.h5 --asc avgSpatialCoh.h5 -m  ../maskPS.h5 -g inputs/geometryRadar.h5 -t smallbaselineApp.cfg

worked fine. Can you try to ingest S1_IW3_048_0081_0082_20200110_20220429.he5 ? The current scripts still wants X_STEP and I got the error:

hdfeos5_2json_mbtiles.py $h5file ./JSON
reading displacement data from file: S1_IW3_048_0081_0082_20200110_20220429.he5 ...

reading mask data from file: S1_IW3_048_0081_0082_20200110_20220429.he5 ...
Masking displacement
Traceback (most recent call last):
  File "/home/centos/operations/rsmas_insar/sources/insarmaps_scripts/hdfeos5_2json_mbtiles.py", line 318, in <module>
    main()
  File "/home/centos/operations/rsmas_insar/sources/insarmaps_scripts/hdfeos5_2json_mbtiles.py", line 304, in main
    convert_data(attributes, decimal_dates, timeseries_datasets, dates, output_folder, folder_name)
  File "/home/centos/operations/rsmas_insar/sources/insarmaps_scripts/hdfeos5_2json_mbtiles.py", line 74, in convert_data
    x_step = float(attributes["X_STEP"])
KeyError: 'X_STEP'
falkamelung commented 2 years ago

From Alfredo/StackTom:

The sample radar coordinates file at /data/HDF5EOS/Miami2SenAT48/minopy/single_reference_network/S1_IW3_048_0081_0082_20200110_XXXXXXXX_PS.he5 doesn't seem to have a correct mask.

I see for old geocoded files, the displacement array and mask array match in size. For example, these are the displacement and mask arrays in this file /data/HDF5EOS/IsraelBig31SenDT21/mintpy/S1_IW123_021_0487_0491_20160118_20201110.he5 : displacement: (256, 834, 1497) mask (834, 1497). Notice the 834, 1497 match in both files. However, in the radar coded file, these are the sizes of the displacement and mask array: displacement (67, 235, 238) mask (85, 66). Notice the last two numbers in displacement don't match the numbers in mask. These means they are not the same size.

The mintpy mask_matrix function is crashing because the mask size doesn't correspond to the displacement size.

falkamelung commented 2 years ago

Sorry, reading your comment again, I don't understand the problem. Everything has '304, 1070' size?

info.py Miami2SenAT48/minopy/single_reference_network/S1_IW3_048_0081_0082_20200110_20220429.he5
******************** Basic File Info ************************
file name: /data/HDF5EOS/Miami2SenAT48/minopy/single_reference_network/S1_IW3_048_0081_0082_20200110_20220429.he5
file type: HDFEOS
coordinates : RADAR

******************** HDF5 File Structure ********************

HDF5 dataset "/HDFEOS/GRIDS/timeseries/geometry/height                ": shape (304, 1070)         , dtype <float32>
  MissingValue    0.0
  Title           height
  Units           meters
  _FillValue      0.0

HDF5 dataset "/HDFEOS/GRIDS/timeseries/geometry/incidenceAngle        ": shape (304, 1070)         , dtype <float32>
  MissingValue    0.0
  Title           incidenceAngle
  Units           degrees
  _FillValue      0.0

HDF5 dataset "/HDFEOS/GRIDS/timeseries/geometry/latitude              ": shape (304, 1070)         , dtype <float32>
  MissingValue    0.0
  Title           latitude
  Units           degrees
  _FillValue      0.0

HDF5 dataset "/HDFEOS/GRIDS/timeseries/geometry/longitude             ": shape (304, 1070)         , dtype <float32>
  MissingValue    0.0
  Title           longitude
  Units           degrees
  _FillValue      0.0

HDF5 dataset "/HDFEOS/GRIDS/timeseries/quality/mask                   ": shape (304, 1070)         , dtype <int32>
  MissingValue    False
  Title           mask
  Units           1
  _FillValue      False
stackTom commented 2 years ago

Sorry, reading your comment again, I don't understand the problem. Everything has '304, 1070' size?

info.py Miami2SenAT48/minopy/single_reference_network/S1_IW3_048_0081_0082_20200110_20220429.he5
******************** Basic File Info ************************
file name: /data/HDF5EOS/Miami2SenAT48/minopy/single_reference_network/S1_IW3_048_0081_0082_20200110_20220429.he5
file type: HDFEOS
coordinates : RADAR

******************** HDF5 File Structure ********************

HDF5 dataset "/HDFEOS/GRIDS/timeseries/geometry/height                ": shape (304, 1070)         , dtype <float32>
  MissingValue    0.0
  Title           height
  Units           meters
  _FillValue      0.0

HDF5 dataset "/HDFEOS/GRIDS/timeseries/geometry/incidenceAngle        ": shape (304, 1070)         , dtype <float32>
  MissingValue    0.0
  Title           incidenceAngle
  Units           degrees
  _FillValue      0.0

HDF5 dataset "/HDFEOS/GRIDS/timeseries/geometry/latitude              ": shape (304, 1070)         , dtype <float32>
  MissingValue    0.0
  Title           latitude
  Units           degrees
  _FillValue      0.0

HDF5 dataset "/HDFEOS/GRIDS/timeseries/geometry/longitude             ": shape (304, 1070)         , dtype <float32>
  MissingValue    0.0
  Title           longitude
  Units           degrees
  _FillValue      0.0

HDF5 dataset "/HDFEOS/GRIDS/timeseries/quality/mask                   ": shape (304, 1070)         , dtype <int32>
  MissingValue    False
  Title           mask
  Units           1
  _FillValue      False

Can you make me a kmz file which corresponds to Miami2SenAT48/minopy/single_reference_network/S1_IW3_048_0081_0082_20200110_20220429.he5?

the current kmz file above corresponds to /data/HDF5EOS/Miami2SenAT48/minopy_SM/S1_IW3_048_0081_0082_20150921_20211112_PS.he5, so I can't see where the ingest script and kmz file are differing.

falkamelung commented 2 years ago

cd /data/HDF5EOS/Miami2SenAT48/minopy/single_reference_network

Persistent Scatterer (PS). To create 'S1_IW3_048_0081_0082_20200110_20220429.he5`

save_hdfeos5.py timeseries_demErr.h5 --tc temporalCoherence.h5 --asc avgSpatialCoh.h5 -m  ../maskPS.h5 -g inputs/geometryRadar.h5 -t smallbaselineApp.cfg
save_kmz.py velocity.h5 --mask ../maskPS.h5 -g inputs/geometryRadar.h5 --step 1

All scatterer:

save_kmz.py velocity.h5 -g inputs/geometryRadar.h5 --step 1

https://js-104-223.jetstream-cloud.org/data/HDF5EOS/Miami2SenAT48/minopy/single_reference_network/velocity.kmz

falkamelung commented 2 years ago

works now