Open BecCowley opened 1 year ago
See here for data locations and code information from Jeff: https://github.com/CARSv2/cars-v2/wiki/CARS2009-helpful-information
Good chat today @BecCowley & @ChrisC28
Here is white board image.
I've made a first pass at producing the "CODA" form of the output from the WOD.
The data is contained in yearly directories. Each directory includes daily files with the naming conventions:
CODAWOD
platform_type = ctd, pfl, xbt, etc.... variable = temperature, pressure, salinity, oxygen,....
So for each variable and each platform type and variable, there are 365 or 366 files.
The files themselves are two dimension (cast, depth_index). Each variable includes the data, the depth levels (WOD data is on depth and NOT pressure it seems) and the WOD flags.
There are a few quirks that I'm working through that could make the data a little easier to read and deal with. For example, the length of the depth dimension varies from file to file, which isn't optimal for reading the data. Exactly what data/metadata to carry through is also something we should all discuss.
The first pass of the data is here: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/WOD
@ChrisC28, looking at the pressure files - is the Pressure_depth variable meant to be the pressure converted to depth? Looks erroneous (looking at one of the CTD files). Will discuss with you!
I've pushed the example notebook to the main branch
@BecCowley The "Pressure_depth" variable is simply the pressure as read in the WOD data on the depth levels. I treat pressure as any other variable. I did notice some strangeness myself. Could you let me know which profile you looked at?
Let's try this again.
Using the wodpync module, I've created some test CODA files. Not all the meta-date is there as I had some boring problems processing strings that I still haven't worked out. Additionally, it seems like not all metadata is carried through in WOD (for example, Salinity doesn't have units).
You can find the test dataset on tube: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test
It's currently CTD only, although I've run some tests on XBT and profiling floats without issues. Let me know if you get a chance to have a play.
@ChrisC28 I had a quick look at the files. Certainly there needs to be some transfer of variable attributes, fixes to fill values etc. Salinity shouldn't have units.
I did some very basic plotting using the WOD flags and temperature and salinity. There are some strange out of range numbers in the WOD_flag variable (-127) for the one file I looked at.
The data itself looks reasonable. I would like to tidy up the files with the correct fill values etc so that they load without issues. Also, I wonder if mixing the names 'depth' and 'z' could be reconciled?
Happy to work on tidying up when back next year!
@ChrisC28 I am reviewing the requirements for the duplicate checking code. Here is a list of metadata that needs to be included in the CODA files if it is available:
accession_number dataset_id lat lon year month day probe_type recorder hour minute country_id GMT_time dbase_orig Project_name platform vehicle Institute
NOTE: wodpy uses masked arrays, which are extremely slow.
I've found places where I think masked arrays can be replaced by regular arrays. I've running soe tests now. Should hopefully speed things up.
I've now modified WODpy to make use of standard numpy arrays rather than masked arrays. I'm still checking things to make sure that what I've done is sensible, but it speeds things up by an order of magnitude.
Hi @BecCowley I've put a couple of CODA test outputs here: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test/2010
The files have the metadata above that should hopefully help the duplicate checker work.
The -127 WODFlag values still appear and I've traced these back to the original data files.
Could have you run your sceptical eye over these files and let me know if they are fit for purpose? I can now regenerate the files quickly, so fixes should be pretty easy to implement
@ChrisC28 Some comments:
The temperature, salinity, oxygen etc have a 'grid_mapping' attribute that references a crs variable. Need to carry this from the WOD files: int crs ; crs:grid_mapping_name = "latitude_longitude" ; crs:epsg_code = "EPSG:4326" ; crs:longitude_of_prime_meridian = 0.f ; crs:semi_major_axis = 6378137.f ; crs:inverse_flattening = 298.2572f ;
Need to carry the origflagset(casts, strnlen) variables from WOD for *_origflag variables char origflagset(casts, strnlen) ; origflagset:comment = "set of originators flag codes to use" ;
Include a long_name attribute for Access_no variable. Maybe 'WOD_accession_number'
Can you include the 'needs_z_fix' variable for XBTs please?
Can you include the 'Ocean_Vehicle' variable if present (probably only in the APB data, you might already have it)?
The ctd files contain the *_Instrument variables for every parameter (Temperature, Salinity, Oxygen). Perhaps only need to include the one for that file (eg, Temperature_Instrument in the ctd_Temperature file).
We can include some more global attributes to describe the dataset, project, references etc. Need to create a list of these.
I will do some testing on the data itself and let you know what I find.
@ChrisC28 some comments/queries on the data:
there are some missing time data. Is this representative of what is in the original file, or an issue with the conversion? Eg, the WOD_CODA_2010_xbt_Temperaturetest.nc file is missing date/time info in 44 profiles, including locations: 688, 715, 895, 1046, 1517 ( WOD unique ids: 13047310, 13047318, 13047370, 12321440)
time in the pfl files is int64. Should be double as per the other files
@BecCowley
I'm moving the WOD_2018 over to:
/oa-decadal-climate/work/observations/CARSv2_ancillary/WOD2018
I've got ctd data from 1970 onwards, and argo, xbt, glider, etc... from 2005 onwards
I've pushed my changes to wodpy to my fork here: https://github.com/ChrisC28/wodpy
Being lazy, the original code is left in the file but commented out
Fixed issue:
Working through the remainder of @BecCowley 's list
@BecCowley
there are some missing time data. Is this representative of what is in the original file, or an issue with the conversion? Eg, the WOD_CODA_2010_xbt_Temperaturetest.nc file is missing date/time info in 44 profiles, including locations:688, 715, 895, 1046, 1517 ( WOD unique ids: 13047310, 13047318, 13047370, 12321440)
Found the issue - wodpync creates a datetime output based on the "date" variable and the "GMT_time" variable. The later is occasionally missing. Wodpy tests for this, but when changing from masked to regular python arrays, the test failed. I've reverse engineered the test with a bit of a hack but it seems to catch those cases.... when GMT time is missing, it takes the time as midnight (as in the original wodpy).
Hi @BecCowley
A new batch of CODA files to check. I've placed them here
/oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test/2010
the files are:
were
@ChrisC28 the WOD_unique id contains latitude, needs correcting as discussed.
Fixed... regenerating the WOD derived CODA files
Added the first batch of MNF -> CODA files:
Note that, as discussed, I haven't included a lot of the meta-data (things like COUNTRY, etc....). Probably worth discussing what we need for the QC/duplicate checking and making sure that we include what's required.
Next step: repeat with the AIMS data.
@ChrisC28 I think we need to add these to all the files at the time of conversion to CODA format:
We will absolutely need the following information for XBT files:
For profiling floats and glider files (when we get there):
The WOD code tables are available here and we should use these values if possible https://www.ncei.noaa.gov/access/world-ocean-database/wod-codes.html
Can you finagle that? Happy to help.
Also still need the CODA WOD files updated to fix the wod_unique_id issue. And, we need to make sure the longitudes of any datasets are in -180:180 degree format, not 360 degrees.
@BecCowley
I've fixed a few bugs and placed the newly created files in a new directory:
/oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test_v2/
I've had a quick look and most of the supporting variables seem to be included. I've have't yet included the WOD Codes, and I think we need a brief discussion as to how to carry these over to the MNF/CSIRO and AIMS (+ other) data.
@BecCowley
Another one for your to-do list:
I've put the first pass test of AIMS->CODA files. They are together with the files I've produced from WOD and MNF in the directory:
/oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test_v2/
divided up by year.
Couple of things to note:
Please have a quick squizz when you get a moment. Could we catch up briefly about what exactly is required from the IQuOD duplicate checker/QC-er.... ?
@BecCowley
I've fixed a few bugs and placed the newly created files in a new directory:
/oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test_v2/
I've had a quick look and most of the supporting variables seem to be included. I've have't yet included the WOD Codes, and I think we need a brief discussion as to how to carry these over to the MNF/CSIRO and AIMS (+ other) data.
@ChrisC28 the attributes in the variables for the AIMS files have an issue (there is a long string in there - ncdump it to see). Will come see you to discuss how to put in the variables.
Hi @BecCowley WOD, MNF, AIMS and RAN data has now been CODA-fyied!
Path for the new test dataset is: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/WOD_CODA_test_v2/
I'll push the converters next week. Would be good to try to refactor the code into a common set of functions (the MNF and RAN code is very similiar).
Note: I found that nutrient etc.... profiles are actually in the ocean station files and not, as I suspected, in the ctd files in WOD (there are also a few in the profiling float data). I had been ignoring ocean stations, but turns out that they are important.
@ChrisC28, @Thomas-Moore-Creative, I see the WOD files are only there from 2000 to 2017. I think Thomas was going to download the latest WOD from 2000 to now - has this been done yet and can we then complete the conversion?
@Thomas-Moore-Creative - has this been done yet and can we then complete the conversion?
It has not, apologies. I'll start this now.
I'll do this over in #19
Hi all,
In my haste to get this out on a Friday evening, I neglected to mention that I've downloaded WOD2018 from the OpenDAP server.
It turned out to be very easy (took me less than 30 minutes):
/oa-decadal-climate/work/observations/WOD2018
@BecCowley - given the above diligence from @ChrisC28 I assume that is as much as we can grab for now from WOD?
.... I note it goes up to 2022
drwxrwsr-x 2 cha674 1109763 4.0K May 11 07:18 2022
Have a look at the notebook here: https://github.com/CARSv2/cars-v2/blob/main/notebooks/Download_WOD_from_OpenDap.ipynb
I've only downloaded a subset of WOD2018. However, it might be worth having nearly the whole thing? Not sure how useful data from Captain Cook might be, but you never know....
Yes, looks like it's downloaded. Thanks for doing this @ChrisC28 However, not all is translated to CODA, we can do that now the tools are there!
@BecCowley I'm running the script now! Converting a bunch of other variables (nutrients, CO2, etc...).
@ChrisC28 here are a list of format issues in the CODAv1 files:
@ChrisC28 another issue to fix - the WOD and originator flag values are float type in the WOD CODA files but double in the MNF versions. I think they should be byte types.
Also, the originator flags in the WOD files are dependent on the origflagset variable which isn't carried through to the CODA files. I would suggest doing a conversion and change the origflag to be consistent, add flag_values and flag_meanings to the origflag variables. Then the data type can be made byte.
@BecCowley : I've updated the MNF and RAN CODA files to fix the negative z issue. This came form using the TEOS10 package to convert from pressure to depth - TEOS10 defines z as negative below the surface.
Now looking into the WOD files.
@BecCowley I've modified the WOD files as mentioned and am currently regnerating the CODA database from 2000 to 2023. It should run overnight. I'm also about to generate the CODA database for CTDs only from 1975 onwards for Ariaan Purich and her student Helen, who have volunteered to act as guinea-pigs.
Collect the notes and data (and perhaps some of the code) used by Jeff to create the CARS2009 product. It would be good to understand how Jeff did:
Also to rescue the data he already collected and re use it in the new product. Where to put this information? Locally it is available in the datalib location, in Jeff's folders. Maybe we need to replicate it somewhere useful for the new CARS, or just make sure we can identify where the important parts are.
Also, thinking about the final format for the new product, it should match the original so users can easily slot the new product into their existing applications.