TUW-GEO / ascat

Read and visualize data from the Advanced Scatterometer (ASCAT) on-board the series of Metop satellites
https://ascat.readthedocs.io/
MIT License
23 stars 16 forks source link

Question: Best approach to rebuild spatial gridded data time-series per WARP5 grid cell #52

Closed serbinsh closed 8 months ago

serbinsh commented 1 year ago

First, I apologize if this question isn't relevant here or for this repo. Second, I may be missing something basic or doing something silly, but I have been exploring different satellite soil moisture datasets to use in an analysis with other satellite observations and have been trying to get a feel for the ASCTA CDR datasets, as well as the packages developed to manipulate and use these data. I have had no issues re-creating the time-series examples for individual GPIs, which has been really helpful! However, I am also interested to re-create a gridded time-series for a specific portion of the Southeast U.S. using the CDR SSM data. It would seem this would be possible, even if its stored in the NetCDF Climate and Forecast ragged format but so far I have not had luck taking a WARP5 cell and reshaping it back to a gridded dataset where each pixel is: time,lat,long,ssm

I am wondering if you would be able to point me in the right direction or possibly have a simple example of how to extract H109/110 SSM data back into a gridded format? I think I am missing how I should be defining the coord/dimensions of the SSM in order to say subset by time and space, then create a new map from that data. The goal is to build a gridded time-series of layers I can then match with other gridded time-series data for different spatial locations.

I suppose I could do this instead via the tools in this package by collecting the lat / lons for the comparison and then feeding this to the standard time-series extraction and build a dataset of SSM by lat/long that we can then match with the other satellite data by space and time. But I figure there must also be a way to make maps of the data for an area too.

Anyway, hope this isnt an annoying question and any feedback would be appreciated! Thanks

sebhahn commented 1 year ago

Your question makes sense here and I've had other users struggling with the same issue.

First and foremost it depends on how you want to regrid the ASCAT data. If it is a simple nearest neighour approach, you can start reading a specific time period of all relevant cell files (e.g. all cell files covering your study area, reading 1 year of data depending on your RAM size) and than loop over all data e.g. daily (before reading the next year and the next etc.). A pre-computed/static look-up table between the current grid and the new regular lat/lon grid will help you filling the data on the grid, since it is simply oversampling (or undersampling, depending on the spacing of the regular lat/lon) the data. If it is a more sophisticated regridding method (e.g. some kind of bilinear averaging or inverse-distance weighting) you would need to recompute the data on the new regular lat/lon grid. Relevant neighour information and distances can pre-computed and also used as look-up table here (see e.g. pyresample kd-tree). Thus, you either copy/fill the data in case of nearest neighour approach or you recomputed the data based on some kind of weighting/distance information.

Now the data/files could be stored e.g. daily, but than you will have multiple observations per day and you always need to store the original timestamp for each observation as well (in case of a sophisticated regridding method you could use the closest observation as a timestamp reference). Hence, it doesn't really matter if you store the data daily, weekly, ... there will be an overlap unless you store the data on a time interval cutting the swath into pieces small enough so that they don't overlap each other. If this is what you are looking for and to make things easier I would recommend looping over every 60 min (starting with >90min you would get an overlap), e.g. calling a file "metop-a_ascat_data_20110101000000_20110101010000.nc", indicating that this file contains all data for 2011-01-01 00:00:00 to 01:00:00. Obviously you would need to split the data for each satellite, otherwise you might get an overlap between satellites in the same time period.

I hope this answer helps you. We could also have a chat on Zoom if you would be interested in more details.

serbinsh commented 1 year ago

@sebhahn Apologies, things got really busy with some pre-proposal submission deadlines. Thank you for the detailed feedback, this is very encouraging.

First, yes I think to start I would want to create a file for each timestamp over my study domain such that I have the bounding box I need and a single timestamp for each file, as you suggest.

I think I understand what you mean regarding the simple NN regrid but I do think perhaps a zoom would help me get started. Not sure if you are still willing to do that but if so I could contact you via email to set that up. Would that be OK?

Thanks!

serbinsh commented 1 year ago

@sebhahn Thanks for chatting today, I really appreciate it.

Just a quick update - I had a little time today to explore subsetting up the H109_***.nc files and understand how to chunk the data by time, observations, and XY locations. I do see that my area of interest (grey shaded) is larger than a single 5x5 (red dots, original WARP5 lat/long points in the file) but I think I can just go ahead and subset and remap them together into a new file that covers the full domain. I'll just need to identify the exact tile numbers needed to fill those gaps - or maybe I just first use the native 5x5 tiles, create new merged files by date/time, and then for each date/time I would just spatially subset the result to the domain before exporting to a new netCDF file.

Some more work to do but getting there

Screen Shot 2023-03-03 at 2 13 43 PM

Screen Shot 2023-03-03 at 2 13 53 PM