Closed smk0033 closed 1 month ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@nemo794 if you have a chance to look it over, here's the NISAR notebook with an updated visualization!
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:10:53Z ----------------------------------------------------------------
Should be, "NISAR is planned to launch no earlier than February 2025."
FYI, the public source for this is: https://nisar.jpl.nasa.gov/news/119/nasas-work-on-nisars-antenna-reflector-nears-completion/
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:10:54Z ----------------------------------------------------------------
Let's update this to better reflect the process used. Maybe something like:
After launch, NISAR data will be hosted at and catalogued by ASF DAAC. Guidance for accessing ASF data via MAAP can be found here: https://docs.maap-project.org/en/latest/technical_tutorials/access/accessing_external_data.html#Accessing-Sentinel-1-Granule-Data-from-the-Alaska-Satellite-Facility-(ASF) In the meantime, the NISAR project has made sample NISAR simulated datasets available via https://nisar.jpl.nasa.gov/data/sample-data/.
However, these NISAR sample products are not searchable via CMR. So, for the convenience of MAAP users, these simulated data products were copied into MAAP's data archive and can be accessed through the MAAP STAC Catalog. This will allow users to become familiar with accessing the NISAR datasets via the MAAP API and in cloud-friendly ways.
To access the Simulated NISAR data, we'll open the MAAP STAC catalog, and then use the collection ID "nisar-sim" to get our desired collection.
wildintellect commented on 2024-08-23T17:43:48Z ----------------------------------------------------------------
Mostly true -
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:10:55Z ----------------------------------------------------------------
Line #4. collection = client.get_collection('nisar-sim')
Is the nisar-sim
collection version controlled? That name is very generic, and unlike most DAAC datasets, the products are still evolving. When the next set of sample data is released, the old set and the new set will no longer be compatible with each other.
(This question is out-of-scope for this PR. But something to consider.)
STAC does have a versioning extension which we could enable. I agree not critical to this demo currently, let's address that when a new version comes out.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:10:55Z ----------------------------------------------------------------
Line #1. nisar_item = list(client.search(collections="nisar-sim").items())[7]
Consider (first) storing the search results in a variable, and then (second) pulling out the granule desired.
In practice, these search results will be looped through for DPS jobs, further filtering, etc. So, storing the results in a unique variable could probably be considered a best-practice.
It is a little confusing how the S3 link is unrelated to the STAC item, this is one way you could show how to get it from the search
:
for item in client.search(collections="nisar-sim").items():
if asset := item.assets.get("GCOV"):
s3_link = asset.extra_fields["alternate"]["href"]
break
It is tricky because each STAC item has a different asset!
_smk0033 commented on 2024-08-27T17:49:54Z_ ----------------------------------------------------------------Thank you so much, Henry! I was trying to get it to automate instead of hardcoding it, but the way I was trying it I kept getting errors and couldn't quite figure out how to get to that alternate href
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:10:56Z ----------------------------------------------------------------
Line #2. nisar_item
In the written text block, it says that we want to pull out the GCOV product. However, in the stdout it says RIFG. Please re-run so that the stdout says GCOV?
Interesting, I think the problem here is we are picking the item by numeric index, when we should actually filter the results based on the name of the product to be reliable. +1 rerun the code to update the print too.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:10:57Z ----------------------------------------------------------------
Instead of hardcoding this, please provide the code to extract the s3 link from nisar_item
and store it in the variable s3_link
.
Then, to demo to the user that it is indeed an s3 link, feel free to include a print(s3_link)
at the end of this cell.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:10:58Z ----------------------------------------------------------------
Wait... huh? I thought that for accessing ASF data, users need to go through this process: https://docs.maap-project.org/en/latest/technical_tutorials/access/accessing_external_data.html#Accessing-Sentinel-1-Granule-Data-from-the-Alaska-Satellite-Facility-(ASF)
It's really confusing to have three different "best practices" for accessing ASF data in the same set of MAAP documentation. Is there a way to consolidate these instructions into a single location for users to follow? Perhaps: could all of the ASF-specific instructions exist in the MAAP's ASF instructions, and then this tutorial simply follows and references those instructions?
smk0033 commented on 2024-08-27T18:23:41Z ----------------------------------------------------------------
Good point! It looks like the method in the link provided focuses on accessing through CMR and then downloading the data. I agree we should probably work on that documentation to make it more comprehensive - taking note of that now!
wildintellect commented on 2024-09-20T23:33:33Z ----------------------------------------------------------------
I think I know how to clarify this even more. Our instructions MAAP or earthaccess are for direct access on AWS be it MAAP/VEDA or a users own cloud instances.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:10:58Z ----------------------------------------------------------------
Again, consider removing this section from the tutorial, or moving it to the primary "How to Access Data from ASF" instructions page.
wildintellect commented on 2024-08-23T17:59:14Z ----------------------------------------------------------------
I disagree, for the NISAR data access the code differs from standard use of earthaccess because the ASF NISAR endpoint isn't standard.
Also this method does technically work on MAAP, however it's really targetted at our users on VEDA where maap-py likely won't work.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:10:59Z ----------------------------------------------------------------
Weird. The actual sample product specifies CF-1.7, but in the stdout it is showing CF-1.8. Why is that?
wildintellect commented on 2024-08-23T18:09:53Z ----------------------------------------------------------------
Can you link to docs where is CF-1.7 is mentioned?
wildintellect commented on 2024-10-09T22:04:30Z ----------------------------------------------------------------
TBD if this matters or not.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:11:00Z ----------------------------------------------------------------
Line #1. dt = open_datatree(nisar_file_obj, engine='h5netcdf', phony_dims='sort')
Where is open_datatree
imported? Perhaps I'm not seeing it, but it's not coming up when I search this page.
from datatree import open_datatree
You probably missed it because it wasn't changed in this diff it's hidden behind an expand button.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:11:00Z ----------------------------------------------------------------
A few comments:
* Phrasing: Why do we "need" to re-project? Is it actually that we want to plot in lat/lon instead of UTM? Or is there a technical reason to require a reprojection step? Let's clarify the phrasing, or else users will get confused.
* Instead of simply stating the CRS, let's teach the users how to extract that. The projection of the product can be dynamically parsed from the product itself:
/science/LSAR/GCOV/grids/frequencyA/projection
Per the publicly-available GCOV Product Spec document, Appendix B details the various projections that the products will be produced in based on geographic region. Since there are several, let's be careful about phrasing.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:11:01Z ----------------------------------------------------------------
A gamma correction isn't necessarily needed. Let's soften the phrasing?
For example, "Next, let's apply a gamma correction by taking the square root of our original values. While plotting, we will also remove the outliers."
wildintellect commented on 2024-08-23T18:18:04Z ----------------------------------------------------------------
Do we want to say why we are applying a gamma correction?
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:11:02Z ----------------------------------------------------------------
Line #1. x_coords = ds_reproj.HHHH.coords['x'].data
Is this a feature of the xarray API to give the coordinates in the reprojected coordinate system?
The products are in UTM, and there are several coordinate datasets with the UTM coordinates already computed and stored in the product. For example, for the example image layer chosen, they are located:
science/LSAR/GCOV/grids/frequencyA/xCoordinates
science/LSAR/GCOV/grids/frequencyA/yCoordinates
Not sure which direction you wanted to go for this tutorial, but wanted to confirm that lat/lon is an intentional decision and not simply a default.
it's not a default at all, I would have actually kept it all in UTM and just noted to the user how to see the data projection and change it if it was necessary.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:11:03Z ----------------------------------------------------------------
Line #5. sqrt_array = (ds_reproj.HHHH.data)**0.5
Alex, please weigh in if this is out of scope for this draft of the notebook, but...
After being decompressed, for a standard GCOV dataset, each e.g. HHHH imagery layer will be ~850MB for many of the standard products over e.g. North America (depending on the NISAR mode, the layers will be larger in some cases, and smaller in other cases). If a user wants to do any scientific computations with a layer, memory quickly becomes an issue if the entire layer is read into memory like is demo'ed here.
In ISCE3, this is handled by processing e.g. GCOV by tiles. Should tile-based processing be something that is demo'ed here?
Future version, probably with Dask that would handle the tiling automatically.
So this sample file is not the same size an an expected real granule?
_wildintellect commented on 2024-09-20T23:47:21Z_ ----------------------------------------------------------------That indicates in the next version we should show how to spatially subset with Xarray. I'm a little worried that the reading ship sailed with use of Datatree which seemed to load a lot into memory. We'll need to research this more for the next version.
View / edit / reply to this conversation on ReviewNB
nemo794 commented on 2024-08-22T21:11:03Z ----------------------------------------------------------------
Line #8. HHHH_update = xr.DataArray(sqrt_array, dims=("yCoordinates","xCoordinates"), coords={"yCoordinates": y_coords, "xCoordinates": x_coords})
As mentioned above xCoordinates
and yCoordinates
are actual datasets in the product that contain the UTM coordinates, so the current labels are misleading.
Let's improve the labels by including:
* Accurate axes labels to specify what is being plotted
* The units
* A title for the plot. (which layer is being plotted, the image correction applied, etc.)
Mostly true -
STAC does have a versioning extension which we could enable. I agree not critical to this demo currently, let's address that when a new version comes out.
View entire conversation on ReviewNB
Interesting, I think the problem here is we are picking the item by numeric index, when we should actually filter the results based on the name of the product to be reliable.
View entire conversation on ReviewNB
I disagree, for the NISAR data access the code differs from standard use of earthaccess because the ASF NISAR endpoint isn't standard.
Also this method does technically work on MAAP, however it's really targetted at our users on VEDA where maap-py likely won't work.
View entire conversation on ReviewNB
from datatree import open_datatree
You probably missed it because it wasn't changed in this diff it's hidden behind an expand button.
View entire conversation on ReviewNB
Future version, probably with Dask that would handle the tiling automatically.
So this sample file is not the same size an an expected real granule?
View entire conversation on ReviewNB
It is a little confusing how the S3 link is unrelated to the STAC item, this is one way you could show how to get it from the search
:
for item in client.search(collections="nisar-sim").items():
if asset := item.assets.get("GCOV"):
s3_link = asset.extra_fields["alternate"]["href"]
break
View entire conversation on ReviewNB
View / edit / reply to this conversation on ReviewNB
hrodmn commented on 2024-08-23T19:57:27Z ----------------------------------------------------------------
Line #2. ds = xr.open_dataset(nisar_file_obj, group='science/LSAR/GCOV/grids/frequencyA', engine='h5netcdf', phony_dims='sort', decode_coords="all")
You could get away without re-opening the file by manually setting the CRS for rioxarray using the embedded projection info:
ds = dt["science/LSAR/GCOV/grids/frequencyA"].ds
ds.rio.write_crs(f'epsg:{ds["projection"].epsg_code}', inplace=True)
It is a little confusing how the S3 link is unrelated to the STAC item, this is one way you could show how to get it from the
search
:
for item in client.search(collections="nisar-sim").items():
if asset := item.assets.get("GCOV"):
s3_link = asset.extra_fields["alternate"]["href"]
break
View entire conversation on ReviewNB
I think this is something we should fix on the STAC side. It's a little odd because the collection contains 1 item per asset type, and the id is the file name. I wonder if we should have done 1 item with 7 assets instead, or if we just need the Product to be more clear in the id or another metadata property so that you could sort them. cc: @jjfrench
Note: In the real final product these might all be different collections. Thoughts @nemo794 ?
View / edit / reply to this conversation on ReviewNB
wildintellect commented on 2024-08-23T23:28:10Z ----------------------------------------------------------------
This code block is confusing is the get_s3_client()
used or can we remove that now?
View / edit / reply to this conversation on ReviewNB
wildintellect commented on 2024-08-23T23:28:11Z ----------------------------------------------------------------
Line #8. HHHH_update = xr.DataArray(sqrt_array, dims=("yCoordinates","xCoordinates"), coords={"yCoordinates": y_coords, "xCoordinates": x_coords})
There must be an easier way to compute a new array and carry the existing coordinates.
Thank you so much, Henry! I was trying to get it to automate instead of hardcoding it, but the way I was trying it I kept getting errors and couldn't quite figure out how to get to that alternate href
View entire conversation on ReviewNB
Thank you all for the great feedback! I'll work on incorporating it into the notebook and re-request reviews once ready
Good point! It looks like the method in the link provided focuses on accessing through CMR and then downloading the data. I agree we should probably work on that documentation to make it more comprehensive - taking note of that now!
View entire conversation on ReviewNB
Went ahead and requested reviews, I think I've addressed most of the comments about adjusting the contents of the notebook (I'll have to explore options pertaining to Alex's comment in the last cell still)
I think I know how to clarify this even more. Our instructions MAAP or earthaccess are for direct access on AWS be it MAAP/VEDA or a users own cloud instances.
View entire conversation on ReviewNB
it's not a default at all, I would have actually kept it all in UTM and just noted to the user how to see the data projection and change it if it was necessary.
View entire conversation on ReviewNB
That indicates in the next version we should show how to spatially subset with Xarray. I'm a little worried that the reading ship sailed with use of Datatree which seemed to load a lot into memory. We'll need to research this more for the next version.
View entire conversation on ReviewNB
View / edit / reply to this conversation on ReviewNB
wildintellect commented on 2024-09-20T23:47:56Z ----------------------------------------------------------------
Line #2.
move the pip install to the cell above, so it's not run every time
TODO: file a ticket to make sure earthaccess is installed by default in all MAAP images.
View / edit / reply to this conversation on ReviewNB
wildintellect commented on 2024-09-20T23:47:57Z ----------------------------------------------------------------
Link MAAP STAC to the docs about MAAP STAC
View / edit / reply to this conversation on ReviewNB
wildintellect commented on 2024-09-20T23:47:57Z ----------------------------------------------------------------
I agree with Sam that the need for reprojection is fairly unclear, it feels like the real reason is to demonstrate how one might reproject if a workflow required it, but generally would be avoided and UTM would be used so all the units stay in meters. I don't consider this a blocker but more something to return to at some point.
smk0033 commented on 2024-10-01T16:16:24Z ----------------------------------------------------------------
Gotcha! I included it because I think changing it to lat/lon was brought up in our original meeting a little while back, but I don't remember the reasoning behind it. I could also be remembering wrong
Gotcha! I included it because I think changing it to lat/lon was brought up in our original meeting a little while back, but I don't remember the reasoning behind it. I could also be remembering wrong
View entire conversation on ReviewNB
@wildintellect addressed the additional comments you left!
View / edit / reply to this conversation on ReviewNB
wildintellect commented on 2024-10-09T22:06:40Z ----------------------------------------------------------------
EarthAccess is also optional depending on the data access method (aka required outside of MAAP)
Applying notebook updates per feedback from Sam
Github board ticket: https://github.com/NASA-IMPACT/active-maap-sprint/issues/979