Open zachsa opened 1 year ago
https://www.ncei.noaa.gov/products/optimum-interpolation-sst - do i download data from here?
Hey Zach,
Sorry it's a bit unclear, but MHW_ocims_clean.py is essential the same as MHW_tracker_clean.py but the MHW_ocims_clean.py writes the output to a json for the old ocims site (not recommended) and MHW_tracker_clean.py writes out to a netcdf (clean).
1) So the basically you are correct in your thinking the MHW_threshold_netcdf_clean.py uses a whole lot of data 1981-present to find the threshold for MHWs, the thinking was to run it once (not operationally) create a "threshold" netcdf file. This file would be permanent/updated yearly.
The data I have downloaded that is the path it was pointing to. I'm not sure if you want wait till I am back and I can give you the data or create the netcdf. Maybe the best thing is to run it on a small dataset download a year or two, the MHWs won't be right be you can test proof of concept. i.e Create a threshold netcdf and run the tracker script pointing to threshold netcdf and a few of the most recent days downloads.
Yes, that is the correct link.
2) Once you have the data you essential subtract the days you are interested in from the threshold file. So in MHW_tracker_clean.py the path to data would be operationally downloaded SST for x number of days.
I'm sure there is a better operational way to do this, I was just trying to save as much space as possible as the ocims test site was on a very small server and was prone to running out of space.
Hope this helps, Matt
Hi Matt,
Thanks - that is very clear. So:
For (1), the download should be baked into the script. Something along the lines of 'if the input data doesn't exist, or if it's older than 1 year, download new data'. Do you know offhand the size of this input file?
Hi Zach,
Yes that is the correct working chain.
For (1) yes that makes a lot of sense, the only hold will be space on the server if you want to hold all that data (we couldn't previously), I'm not sure of the size (don't have the hard drive with me) but I estimate between 20-60 gigs if I had to make a very wide estimate. It depends a lot on the spatial extend on the download.
Okay. I should have enough space to download it, create the thresholds, and then delete it. I assume the thresholds are much smaller?
Yes the thresholds are much smaller probably less than a gig, I'm just thinking maybe it's best to keep it somewhere so you don't need to download the whole dataset each time we update it?
@mattcarr03. Thanks - the files don't actually look that big for historical SST - maybe 30GB for the everything available from 1981. Can you confirm this is the correct site to download from? (https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/)
Also, this is the daily SST output I get from running the download script for mhw sst - https://mnemosyne.somisana.ac.za/egagasini/for-sharing. Can you confirm that is the correct download for the daily sst for mhw?
For the historical data. the download should actually be fairly quick since every file is 1.6MB. I can probably do 100 concurrent downloads at a time depending on what their servers will allow. So the download time should be reflective of maybe 400MB instead of 30GB in terms of time. And then I can check for new files on every run and regenerate thresholds whenever a new file is found and downloaded.
Is this a good approach? (I haven't looked at the thresholds script yet).
For the thresholds script, if i recall it was combining multiple NetCDF files prior to doing the thresholds calculations (not sure if this makes sense). Can the directory structure reflect the structure on the remote server or do i have to flatten it?
Hope Australia is going well!
Thanks Zach, leaving tonight and back on Sunday!
Okay that's not too bad for the download size and cool to do it concurrent! Yes the is the correct link.
The https://mnemosyne.somisana.ac.za/egagasini/for-sharing link is giving my a bad gateway error not sure if I was too late to use it?
"For the thresholds script, if i recall it was combining multiple NetCDF files prior to doing the thresholds calculations (not sure if this makes sense). Can the directory structure reflect the structure on the remote server or do i have to flatten it?"
The NetCDF can be load in from many different formats so yes it can definitely reflect the remote sever structure. I just combined the Netcdf as I downloaded the files all to one directory. So it's just a matter of changing the input into the python script.
This is a similar tool for the globe http://whalemap.ocean.dal.ca/MHW/. It would be cool to extend the domain to include Mozambique and Madagascar as that is a region strongly impacted by mhws.
Looking forward to working with you on this when I get back
Sorry - the link I gave you no longer exists as one my servers file system became corrupt and had to be restored to a version prior that file existing.
Let me know when you are ready for a video call next week - will be fun working on this together again if you have time! I've restructured the 'toolkit' CLI so that the python scripts for mhw/operational models/lacce/etc can all be bundled together.
For example, the the modified CLI looks like this:
somisana ops ... (operational model options)
somisana mhw ... (marine heat waves options)
somisana lacce ... (marine heat waves options)
That example - http://whalemap.ocean.dal.ca/MHW/ - looks nice.
The operational models are aimed at supporting curvilinear grids without interpolating any data - i.e. working with vector data. The example shows working with raster data which is much more scalable.
@mattcarr03. The previous link (https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/
) doesn't allow for programmatically browsing the directory for files. The same files are available on a thredds server - https://www.ncei.noaa.gov/thredds/blended-global/oisst-catalog.html (that I assume has an API for searching directory listings).
I assume I want the Optimally Interpolated V2.1 SST AVHRR / OISST-V2.1-AVHRR-Daily Files/
files (https://www.ncei.noaa.gov/thredds/catalog/OisstBase/NetCDF/V2.1/AVHRR/catalog.html
)
is this correct? (this is the data used as input to calculate thresholds)
@zachsa yes that is correct it's the same product
Hey there @mattcarr03
I'm re-packaging this work for automated deployments (and for including in our somisana web site) - starting with the Marine Heat tracker. When you have a chance, please can you point me in the right direction! I want to know the following:
MHW_threshold_nedcdf_clean.py
andMHW_tracker_clean.py
scripts should be run in succession, but doesn't mention theMHW_ocims_clean.py
script. What is this file?Looking at the comments in 1_MHW_threshold_netcdf_clean.py
For the
input data
:2_MHW_tracker_clean.py & 2. MHW_ocims_clean.py
Same question regarding the SST input NetCDF. I can see that these scripts run on the output of the thresholds script
I see this:
path_to_data = '/media/Elements/SST_data/Low_res_SST/NCEI_OISST_AVHRR_ND/'
. What is this data source?