OCIMS-tools / Marine-heat-waves

This repository contains the scripts and functions needed to produce the real-time marine heat wave tracking displayed on the OCIMS test platform.
0 stars 0 forks source link

Questions on porting the code #1

Open zachsa opened 1 year ago

zachsa commented 1 year ago

Hey there @mattcarr03

I'm re-packaging this work for automated deployments (and for including in our somisana web site) - starting with the Marine Heat tracker. When you have a chance, please can you point me in the right direction! I want to know the following:

Looking at the comments in 1_MHW_threshold_netcdf_clean.py

For the input data:

the path to the folders containing the sst data (netcdf files) Where do these files come from? (i.e. is this CROCO output, CROCO output + normalized grid, something else, etc. etc.)

2_MHW_tracker_clean.py & 2. MHW_ocims_clean.py

Same question regarding the SST input NetCDF. I can see that these scripts run on the output of the thresholds script

I see this: path_to_data = '/media/Elements/SST_data/Low_res_SST/NCEI_OISST_AVHRR_ND/'. What is this data source?

zachsa commented 1 year ago

https://www.ncei.noaa.gov/products/optimum-interpolation-sst - do i download data from here?

mattcarr03 commented 1 year ago

Hey Zach,

Sorry it's a bit unclear, but MHW_ocims_clean.py is essential the same as MHW_tracker_clean.py but the MHW_ocims_clean.py writes the output to a json for the old ocims site (not recommended) and MHW_tracker_clean.py writes out to a netcdf (clean).

1) So the basically you are correct in your thinking the MHW_threshold_netcdf_clean.py uses a whole lot of data 1981-present to find the threshold for MHWs, the thinking was to run it once (not operationally) create a "threshold" netcdf file. This file would be permanent/updated yearly.

The data I have downloaded that is the path it was pointing to. I'm not sure if you want wait till I am back and I can give you the data or create the netcdf. Maybe the best thing is to run it on a small dataset download a year or two, the MHWs won't be right be you can test proof of concept. i.e Create a threshold netcdf and run the tracker script pointing to threshold netcdf and a few of the most recent days downloads.

Yes, that is the correct link.

2) Once you have the data you essential subtract the days you are interested in from the threshold file. So in MHW_tracker_clean.py the path to data would be operationally downloaded SST for x number of days.

I'm sure there is a better operational way to do this, I was just trying to save as much space as possible as the ocims test site was on a very small server and was prone to running out of space.

Hope this helps, Matt

zachsa commented 1 year ago

Hi Matt,

Thanks - that is very clear. So:

  1. Download SST data from NOAA for as far back as possible - update every year
  2. Use that data to generate thresholds. Doesn't have to be every day - update every time (1) occurs
  3. Run MHW_tracker_clean.py every day comparing current SST (downloaded) to thresholds

For (1), the download should be baked into the script. Something along the lines of 'if the input data doesn't exist, or if it's older than 1 year, download new data'. Do you know offhand the size of this input file?

mattcarr03 commented 1 year ago

Hi Zach,

Yes that is the correct working chain.

For (1) yes that makes a lot of sense, the only hold will be space on the server if you want to hold all that data (we couldn't previously), I'm not sure of the size (don't have the hard drive with me) but I estimate between 20-60 gigs if I had to make a very wide estimate. It depends a lot on the spatial extend on the download.

zachsa commented 1 year ago

Okay. I should have enough space to download it, create the thresholds, and then delete it. I assume the thresholds are much smaller?

mattcarr03 commented 1 year ago

Yes the thresholds are much smaller probably less than a gig, I'm just thinking maybe it's best to keep it somewhere so you don't need to download the whole dataset each time we update it?

zachsa commented 1 year ago

@mattcarr03. Thanks - the files don't actually look that big for historical SST - maybe 30GB for the everything available from 1981. Can you confirm this is the correct site to download from? (https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/)

Also, this is the daily SST output I get from running the download script for mhw sst - https://mnemosyne.somisana.ac.za/egagasini/for-sharing. Can you confirm that is the correct download for the daily sst for mhw?

zachsa commented 1 year ago

For the historical data. the download should actually be fairly quick since every file is 1.6MB. I can probably do 100 concurrent downloads at a time depending on what their servers will allow. So the download time should be reflective of maybe 400MB instead of 30GB in terms of time. And then I can check for new files on every run and regenerate thresholds whenever a new file is found and downloaded.

Is this a good approach? (I haven't looked at the thresholds script yet).

For the thresholds script, if i recall it was combining multiple NetCDF files prior to doing the thresholds calculations (not sure if this makes sense). Can the directory structure reflect the structure on the remote server or do i have to flatten it?

zachsa commented 1 year ago

Hope Australia is going well!

mattcarr03 commented 1 year ago

Thanks Zach, leaving tonight and back on Sunday!

Okay that's not too bad for the download size and cool to do it concurrent! Yes the is the correct link.

The https://mnemosyne.somisana.ac.za/egagasini/for-sharing link is giving my a bad gateway error not sure if I was too late to use it?

"For the thresholds script, if i recall it was combining multiple NetCDF files prior to doing the thresholds calculations (not sure if this makes sense). Can the directory structure reflect the structure on the remote server or do i have to flatten it?"

The NetCDF can be load in from many different formats so yes it can definitely reflect the remote sever structure. I just combined the Netcdf as I downloaded the files all to one directory. So it's just a matter of changing the input into the python script.

This is a similar tool for the globe http://whalemap.ocean.dal.ca/MHW/. It would be cool to extend the domain to include Mozambique and Madagascar as that is a region strongly impacted by mhws.

Looking forward to working with you on this when I get back

zachsa commented 1 year ago

Sorry - the link I gave you no longer exists as one my servers file system became corrupt and had to be restored to a version prior that file existing.

Let me know when you are ready for a video call next week - will be fun working on this together again if you have time! I've restructured the 'toolkit' CLI so that the python scripts for mhw/operational models/lacce/etc can all be bundled together.

For example, the the modified CLI looks like this:

somisana ops ... (operational model options)
somisana mhw ... (marine heat waves options)
somisana lacce ... (marine heat waves options)
zachsa commented 1 year ago

That example - http://whalemap.ocean.dal.ca/MHW/ - looks nice.

The operational models are aimed at supporting curvilinear grids without interpolating any data - i.e. working with vector data. The example shows working with raster data which is much more scalable.

zachsa commented 1 year ago

@mattcarr03. The previous link (https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/) doesn't allow for programmatically browsing the directory for files. The same files are available on a thredds server - https://www.ncei.noaa.gov/thredds/blended-global/oisst-catalog.html (that I assume has an API for searching directory listings).

I assume I want the Optimally Interpolated V2.1 SST AVHRR / OISST-V2.1-AVHRR-Daily Files/ files (https://www.ncei.noaa.gov/thredds/catalog/OisstBase/NetCDF/V2.1/AVHRR/catalog.html)

is this correct? (this is the data used as input to calculate thresholds)

mattcarr03 commented 1 year ago

@zachsa yes that is correct it's the same product