Open falkamelung opened 1 month ago
1) So you will add these elevation values as attributes to be ingested? 2) what do you mean "we should get them out of the database". I don't know what "them" means.
They are not attributes. They exist for all pixels.
With 'out of the database' I mean we should have an option to export a dataset as csv or *hdf5 file to get it in the same format as it was ingested (or different, easy-to-use formats).
I see.
Can't we just update the s1 file to insarmaps? Memory and processing are always a trade off with each other. If we must recreate an h5 file from the database that will take a very long time to read the whole table and create an h5 file out of it. We would save space, but then it will be a very time and resource intensive process with how big some of these tables are. It might then slow down the database for everything else depending how many times people click to create an h5 file.
Also, h5 files are made to store large amounts of data. A csv file would be fine for a small dataset, but would be very inefficient for big datasets. Each character of a number is a byte. H5 files support actual floats etc. a float has precision to about 7 decimal digits and uses 4 bytes. You would need around 8 bytes to represent this number using 1 byte characters so the files would grow much bigger than the h5 and the ingest would be much slower.
OK. That is fine. Keeping the files elsewhere is better then. I just know that people were able to get big data set out of the European site but have not looked in detail how this works.
We have an attribute for the URL of the ingested files. We should consider reviving this. For mintpy it was easy as the files were in a folder called mintpy
. Now we have folders that have a date attached to it., e.g. miaplpy_MDCBeach_201901_202310
. Maybe we should have on jetstream a file /data/HDF5EOS/insarmaps_files.txt that lists the location of the files and insarmaps checks there for the precise URL.
That would be fine. Could be ingested as an extra attribute for each dataset as well.
In regards to the European site, I really doubt they recreate their downloads on the fly from their database. They probably just have premade h5/csv files stored that is what's fetched during download.
When clicking on a point it also should show the estimated elevation and the temporal coherence in the time series window (after velocity and v_std). If the estimated elevation does not exist it should use height instead. For the display it should say whether it is height or estimated elevation.
Our S1 files currently don't have estimated elevation. But it is really important for the high-res urban data (Miami). I will provide a CSV file that has it.
I suspect these fields are not in the database and need to be imported.
The European website has all this, including additional fields like incidence angle and azimuth. So actually, it would be good to ingest all fields that are in the S1 file and display these as well.
Currently we use the postgresQL database only for insarmaps and keep the S1 files elsewhere. This is stupid. We should get them out of the database. I don't want to deal with this right now, also because of disk space limitations on insarmaps, but something to keep in mind.