FWI Visualization - Githubissues

paridhi-parajuli commented 1 year ago

Use Panel parameterized objects (above) to optimize existing FWI dashboards.
Use existing fire detections and plot the corresponding FWI time series and chiclet plots.
- Read this using geopandas: s3://veda-data-store-staging/EIS/other/feds-output-conus/latest/perim-large.fgb –done
- Pick a fire from here (just select one row).
- Draw a 5km buffer around this fire (using Geopandas)
- Use that buffer as input to FWI time series and chiclet plot.

paridhi-parajuli commented 1 year ago

Feb 17,2023

Panel Parameterized objects created.
Having issues with the file upload functionality. Punt for now. Create a separate static notebook where user enters file name as a variable at the top.
Working to do this without dataframe creation. Will it make it faster? Probably! Keep working on this.
Created buffer polygon around centroid. Good. Let’s try just a buffer around the geometry itself (not the centroid).
Having issues with the distance to lat-lon conversion. This is just a warning. Buffer is in units of the projection – for lat-lon data, the unit is “degrees” which is not a true distance unit. But, e.g., 0.5 degree buffer is fine.
No variable in lis-tws-trend data. This is a DataArray already. Can be accessed with data.values E.g., data[0,0,0:3,0:3].values to retrieve the first few pixels. …but there are issues with the S3 read, not your code. Investigate with Slesa/Iksha/etc. More generally, stacstack produces Xarray DataArrays. They are accessible via data.values, subset via data[...], etc.
Need to work for multipolygons. Simple workaround: .geometry.convex_hull (recall – need to do .geometry.convex_hull.exterior.coords) Ideally, need a better solution – maybe some kind of union

paridhi-parajuli commented 1 year ago

Feb 20,2023

FWI analysis

Example FWI dashboard using (buffered) fire perimeter
Look for ways to optimize FWI dashboard – e.g., skip dataframe creation?

STAC

Work with VEDA team to get basic STAC visualization/analysis example
Consider a different dataset – not TWS anomaly. Maybe HLS. Maybe NO2 or CO2 data.
Slesa now has a STAC entry for two Zarr datasets (SPL3SMP – SMAP; OCO2 L3). Try creating a notebook that reads one of these Zarr datasets from the STAC catalog entry and does a basic plot.
Get STAC catalog entry for these datasets from Slesa.

paridhi-parajuli commented 1 year ago

2023-02-24

FWI analysis

All null values issue – needed to add all_touched=True argument. Input polygons were too small, so no data were being selected.
Also recommend drop=True to throw an informative error.

STAC

Zarr visualizations worked.

For next week

FWI analysis

Get a complete static (non-dashboard) example working and push to GitHub. --> done
Clean up and add documentation to the static example. Once this is pushed, post a link in the EIS-FEDS Slack for Tess, etc. to review. --> done
Get the panel dashboard working and push to GitHub. --> done

STAC

Once Slesa has debugged things, test it out. --> Pending
Giovanni Zarr store
Look for Zarr store in s3://prod-giovanni-cache/zarr/
Try to open and do a basic plot (map for one step, time series for one pixel) of each of these in Xarray. And report back – what works, what doesn’t. --> works with xr.open_zarr() not with xr.open_dataset() and only for GPM_3IMERGHH_06_precipitationCal

paridhi-parajuli commented 1 year ago

For next week

Email Sharonin, Katrina (GSFC-DK000)[Intern] (intern on EIS-Fire) to set up a meeting, discuss ongoing tasks, identify tasks where you can help. Report back to me.
Analysis of ESDIS metrics Alexey is in the process of downloading the data from (closed) ESDIS metrics service. Look for data provided in the doc Several datasets:
- archive-size-by-product-totals – Total volume of each data product
- data-products — Additional information on each data product, including science discipline, etc. Useful for merging with other datasets.
- total-distribution-by-product-2022 — Total data downloads by product and distribution mechanism in 2022. Note that some products have multiple distribution mechanisms.
- total-distribution-by-user-2022 — Total data downloads by user and product in 2022. Since this contains some mildly sensitive user information (emails), I password protected it – see my Slack message for the password.
- Some questions to address:
- Total archive data volume by science discipline
- Distribution, and cumulative distribution, of data downloads by volume. E.g., How many datasets account for the top 95% of data downloads?
- What were the top 100 data products distributed in 2022 by volume? By number of unique users? What are the similarities and differences between these top 100 lists – e.g., which products appear in these lists regardless of how you count? Are there any products that are especially popular in terms of number of users but not in terms of data volume? Vice versa?
- Which providers (DAACs) distributed these datasets? What data formats are these datasets distributed in? What data services are available for these services (this may require some separate browsing of the dataset websites)?
- What is the distribution of data users? E.g., How many users account for the top 95% of data downloads?
- Download volume by user discipline?
- What were the most popular data download mechanisms by volume in 2022?
- Archive size and distribution volume by product level (Level 1, Level 2, Level 3, etc.).
- Report all of these results in a Jupyter notebook shared via GitHub.
STAC – once Slesa figures this out, come back to this task.

paridhi-parajuli commented 1 year ago

For next week

Metrics data analysis

Finalize some additional analyses of distribution and product volume
Clean up data analysis report – remove commented code; add descriptions of underlying datasets and analyses.

New project: Working with HDF-EOS data

Download an example HDF4 file – e.g., one granule of MODIS surface reflectance (Earthdata Search | Earthdata Search (nasa.gov)) to your own computer.
View the file in HDFView. Make sure you understand its structure.
HDF® View - The HDF Group
Introduction to the HDF4 Data Format - Explore H4 Files Using HDFView | Earth Data Science - Earth Lab
TASK 1: Convert this to parquet format
Data table with columns for X, Y, time, one column per variable
TASK 2: Reverse-engineer core capabilities of HDFView in Python
Retrieve data attributes and data values in basic Python data structures (strings, lists, dicts, numpy arrays, etc.).
I started working on this a bit already: ashiklom/hdf4-python-explore: Exploring HDF4 data structures using pure Python (github.com)
You will need to consult the low-level HDF4 documentation: HDF4 (hdfgroup.org)

paridhi-parajuli commented 1 year ago

Newly added to notebook :

Analysis with respect to domain of user
Changed all sorting by volume
Good volume products and good number of user products
Format of top 30 products by volume
Cleaned level and done level wise analysis
Analysis with respect to Provider
Merged products and archive data for further analysis
Manipulated discipline field for discipline wise analysis
Download volume by user discipline? --> done

paridhi-parajuli commented 1 year ago

For next week: Modify code to create parquet files using geopandas directly. Create pandas data frame Add the observation timestep (date + time) as a time column to the dataset. Ideally, we want the exact timestep of each pixel…but only if we can find it. Convert to geopandas geodataframe with argument for converting lat/lon columns to geometry (and set CRS to EPSG 4326). Try working with new geoparquet files in geopandas: Read (gpd.read_parquet). Subset by arbitrary polygon – (1) Identify a arbitrary polygon that’s inside the MODIS image; (2) create it as a geopandas / shapely object; (3) crop the MODIS geodataframe to the object from (2). Create parquet files for 3-5 adjacent MODIS tiles. Try reading and subsetting multiple parquet files at once using geopandas. NetCDF analog – xr.openmfdataset(“dat*.nc). Trying to do something similar with Parquet. GOAL: Try to work with 3-5 adjacent MODIS tiles as one continuous dataset. Try doing some basic subsetting of files using Arrow (Reading and Writing the Apache Parquet Format — Apache Arrow v11.0.0). E.g., Try grabbing all pixels with reflectance above a certain value. Look for ways to do spatial subsetting with Arrow. Chat with Denis Tuesday 1pm CT / 2pm ET about new activity.

Earth-Information-System / veda-data-processing

FWI Visualization #11