geodesymiami / insarmaps

3 stars 0 forks source link

HDF-EOS5 file download #11

Closed yunjunz closed 3 years ago

yunjunz commented 5 years ago

It will be very useful if insarmaps website supports downloading of the HDF-EOS5 file, besides the current "Download TXT". Then visitors could really use the entire InSAR product for their own research, work, etc.; instead of checking the result on insarmaps, then process the data again themselves.

This option should be optional, so that one can decide whether the entire HDF-EOS5 file can be downloaded or not. One might not want to enable this before the related paper published.

falkamelung commented 5 years ago

I agree. Currently, if you load data up on my GoogleDrive account they are available for download automatically. But you have to do drag-and-drop in your browser. Unfortunately we never figured out how to upload to Google Drive with a script. So we are looking for another place where we easily can leave a good amount of data. Either an AWS S3 bucket (from the ASF) or the NSF Jetstream system.

David wanted to look at jetstream but he did not get to this. Maybe @Ovec8hkin has a suggestion from his experience with the hazard products? @Ovec8hkin, did you decide about a final home for the hazards website, e.g. a VM on jetstream that hosts, say, 5 TB of data, and is open to the world?

We may need one of their science gateways https://www.xsede.org/ecosystem/science-gateways/gateways-listing

stackTom commented 4 years ago

Any updates on this? We currently still use the google drive. Should I try to figure out how to access google drive from a script?

falkamelung commented 4 years ago

First item Yes, we should do this. We now have the data at the location below. The S1.he5 (or CSK.he5 or Also.he5; glob for .he5) file is in the mintpy folder. Let's have check for an attribute insarmaps_download_flag. If True it should be downloadable. I will upload examples with True and False so that we can try. The

https://js-104-223.jetstream-cloud.org/data/HDF5EOS
https://js-104-223.jetstream-cloud.org/data/HDF5EOS/GalapagosSenAT106/mintpy/*.he5

I ingested two test data sets. unittestGalapagosSenDT128 with insarmaps_download_flag=True. Do you see this flag?

info.py S1_IW1_128_0596_0597_20160605_XXXXXXXX.he5 | grep insarmaps
  insarmaps_download_flag                     True

And another one. qunittestGalapagosSenDT128 with insarmaps_download_flag=False:

info.py S1_IW1_128_0595_0597_20160605_20160828.he5 | grep insarmaps
  insarmaps_download_flag                     False

Second item

Please remove References and replace Figures by Processing Report. The latter should point to:

https://js-104-223.jetstream-cloud.org/data/HDF5EOS/GalapagosSenAT106/mintpy/pic

This would be for all datasets, regardless of insarmaps.download.

It actually would be good to have a /pic/index.html that displays thumbnail images. When you click on them it displays e.g. /pic/coherence_1.png. This actually should be part of MintPy and not insarmaps. If you have an easy way to create this please send. I will commit to MintPy and add some text explanations.

stackTom commented 4 years ago

You need to reingest the two test datasets with insarmaps_download_flag, as the python scripts currently do not ingest this attribute. Alternatively, use one of the scripts I made previously to add an attribute to a dataset. However, I suggest using the updated minpty insarmaps scripts I am about to commit to do so to make sure the change works for new datasets going forward.

@falkamelung I will post here when I have made the commits so you can do a git pull and re-run with the latest scripts.

@yunjunz Can you give me commit access to mintpy so I can commit my changes? I can also just fork mintpy and do pull requests if you prefer. Up to you, please let me know.

stackTom commented 4 years ago

Second item

Please remove References and replace Figures by Processing Report. The latter should point to:

https://js-104-223.jetstream-cloud.org/data/HDF5EOS/GalapagosSenAT106/mintpy/pic

This would be for all datasets, regardless of insarmaps.download.

It actually would be good to have a /pic/index.html that displays thumbnail images. When you click on them it displays e.g. /pic/coherence_1.png. This actually should be part of MintPy and not insarmaps. If you have an easy way to create this please send. I will commit to MintPy and add some text explanations.

I am unable to see anything at that link. Chrome says "the site can't be reached"

falkamelung commented 4 years ago

Hey @stackTom , great to see you on GitHub! For the ingest scripts, I would suggest to remove them from MintPy and make them their own repo (assuming that part of the insarmaps repo does not make sense), e.g. insarmaps-ingest . See this issue:

https://github.com/geodesymiami/insarmaps/issues/15

yunjunz commented 4 years ago

Hi @stackTom, I just sent you an invitation to insarlab/MintPy. The current default branch is protected, so pull request is the only way. You could do it either through forking to your personal account or through creating a branch on insarlab/MintPy, whichever you prefer.

I agree that having an independent repo for these python scripts is better. When you finished that, issue a PR to remove them from mintpy please.

stackTom commented 4 years ago

@yunjunz @falkamelung Will do. Should I make a new repo altogether for the insarmaps scripts? There is currently a folder inside the insarmaps repo called python_scritps, with all the latest and relevant insarmaps scripts. Not sure if is worth it to have a separate repo, but I can do it if you guys want.

Edit: Disregard, I am making a new repo.

stackTom commented 4 years ago

Have any datasets been updated with insarmaps.download = True so I can implement this feature?

stackTom commented 4 years ago

The processing report link is working. I just need a test dataset with insarmaps.download = True to make sure this feature works as well.

stackTom commented 4 years ago

Both the processing report and the downloading of the .he5 file now work. Question - how important is it that only the files with insarmaps_download_flag=True are allowed to be downloaded? Right now, the logic for this in the browser, so anyone could theoretically look at the code and figure it out themselves and download it even if the flag is set to "No". I could make this logic be on the server if some datasets need to be kept from being downloaded for some reason, but that is more work, so not sure if it is needed.

yunjunz commented 4 years ago

This is great @stackTom!

From my perspective, keeping this logic on the server is important, as we don't want to share the product for the project that we are working on. @falkamelung what do you think?

stackTom commented 4 years ago

It won't be too bad to do this. The only caveat is that even if it is on the server, it doesn't take too much willpower to reach the he5 file even if the link doesnt appear directly in the source code in the browser (surrounded by an if statement based on the insarmaps_download_flag, of course), especially since the processing report link appears for every dataset. And it is trivial to go from the processing report link to the he5 file by just going up in the URL directory structure from the processing report link.

So now that I think about it, it might not be worth getting the link to the client from the server, as the links are very predictable from the processing report. But I can do it if you guys think it is best.

TLDR: the current way I do it in javascript on the browser would take about the same effort to find the he5 file for datasets that have the insarmaps_download_flag = No as if I kept the link to the he5 file on the server and served it to the client. Only way to really circumvent this is if the links werent so predictable.

stackTom commented 4 years ago

Also, we still have that permissions feature on the website that allows certain datasets to only be visible by certain users with certain privileges. I am not sure Professor Falk every uploaded a dataset with custom permissions.

yunjunz commented 4 years ago

I agree that with the very predictable links, it does not worth moving it.

falkamelung commented 4 years ago

Regarding the download permissions, I am not worried. If we don't want to share a file we just don't upload to Jetstream. Another possibility is to upload with permissions that don't allow download for everybody. The upload is done here. We probably just have to remove Fo=r ??

Did you implement the download already? This example does not have it

http://insarmaps.miami.edu/start/-0.008/-78.0/9?startDataset=S1_IW1_128_0596_0597_20160605_XXXXXXXX `

stackTom commented 3 years ago

Regarding the download permissions, I am not worried. If we don't want to share a file we just don't upload to Jetstream. Another possibility is to upload with permissions that don't allow download for everybody. The upload is done here. We probably just have to remove Fo=r ??

Did you implement the download already? This example does not have it

http://insarmaps.miami.edu/start/-0.008/-78.0/9?startDataset=S1_IW1_128_0596_0597_20160605_XXXXXXXX `

The code is there, but it is not working. However, this dataset does not seem to have the insarmaps.download = True or insarmaps_download_flag. Did you re ingest this dataset. Or use the add_attributes_insarmaps script to add this new attribute to it? Because these attributes aren't in the database.

falkamelung commented 3 years ago

Falk has to test using 'insarmaps_download_flag' .

This is low priority. So closing for now.

stackTom commented 3 years ago

We used to have a google drive from where we could download he5 files. Is this still available? It is not working, and keeps throwing an error, saying we have exceeded daily usage limits. Not sure if I just need to generate a new API key and that will do it.

I am commenting out this code, since download will be through jetstream. I will fully remove it once jetstream is up and the insarmaps_download_flag method is fully working.