CARSv2 / cars-v2

CARSv2 project repository - public
MIT License
4 stars 2 forks source link

Curation of records, data, processes from original CARS #20

Open BecCowley opened 9 months ago

BecCowley commented 9 months ago

Collect the notes and data (and perhaps some of the code) used by Jeff to create the CARS2009 product. It would be good to understand how Jeff did:

Also to rescue the data he already collected and re use it in the new product. Where to put this information? Locally it is available in the datalib location, in Jeff's folders. Maybe we need to replicate it somewhere useful for the new CARS, or just make sure we can identify where the important parts are.

Also, thinking about the final format for the new product, it should match the original so users can easily slot the new product into their existing applications.

BecCowley commented 9 months ago

See here for data locations and code information from Jeff: https://github.com/CARSv2/cars-v2/wiki/CARS2009-helpful-information

Thomas-Moore-Creative commented 9 months ago

Good chat today @BecCowley & @ChrisC28

Here is white board image.image

ChrisC28 commented 9 months ago

I've made a first pass at producing the "CODA" form of the output from the WOD.

The data is contained in yearly directories. Each directory includes daily files with the naming conventions:

CODAWOD.nc

platform_type = ctd, pfl, xbt, etc.... variable = temperature, pressure, salinity, oxygen,....

So for each variable and each platform type and variable, there are 365 or 366 files.

The files themselves are two dimension (cast, depth_index). Each variable includes the data, the depth levels (WOD data is on depth and NOT pressure it seems) and the WOD flags.

There are a few quirks that I'm working through that could make the data a little easier to read and deal with. For example, the length of the depth dimension varies from file to file, which isn't optimal for reading the data. Exactly what data/metadata to carry through is also something we should all discuss.

ChrisC28 commented 9 months ago

The first pass of the data is here: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/WOD

BecCowley commented 9 months ago

@ChrisC28, looking at the pressure files - is the Pressure_depth variable meant to be the pressure converted to depth? Looks erroneous (looking at one of the CTD files). Will discuss with you!

ChrisC28 commented 9 months ago

I've pushed the example notebook to the main branch

ChrisC28 commented 9 months ago

@BecCowley The "Pressure_depth" variable is simply the pressure as read in the WOD data on the depth levels. I treat pressure as any other variable. I did notice some strangeness myself. Could you let me know which profile you looked at?

ChrisC28 commented 8 months ago

Let's try this again.

Using the wodpync module, I've created some test CODA files. Not all the meta-date is there as I had some boring problems processing strings that I still haven't worked out. Additionally, it seems like not all metadata is carried through in WOD (for example, Salinity doesn't have units).

You can find the test dataset on tube: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test

It's currently CTD only, although I've run some tests on XBT and profiling floats without issues. Let me know if you get a chance to have a play.

BecCowley commented 8 months ago

@ChrisC28 I had a quick look at the files. Certainly there needs to be some transfer of variable attributes, fixes to fill values etc. Salinity shouldn't have units.

I did some very basic plotting using the WOD flags and temperature and salinity. There are some strange out of range numbers in the WOD_flag variable (-127) for the one file I looked at.

The data itself looks reasonable. I would like to tidy up the files with the correct fill values etc so that they load without issues. Also, I wonder if mixing the names 'depth' and 'z' could be reconciled?

Happy to work on tidying up when back next year!

BecCowley commented 8 months ago

@ChrisC28 I am reviewing the requirements for the duplicate checking code. Here is a list of metadata that needs to be included in the CODA files if it is available:

accession_number dataset_id lat lon year month day probe_type recorder hour minute country_id GMT_time dbase_orig Project_name platform vehicle Institute

ChrisC28 commented 7 months ago

NOTE: wodpy uses masked arrays, which are extremely slow.

I've found places where I think masked arrays can be replaced by regular arrays. I've running soe tests now. Should hopefully speed things up.

ChrisC28 commented 7 months ago

I've now modified WODpy to make use of standard numpy arrays rather than masked arrays. I'm still checking things to make sure that what I've done is sensible, but it speeds things up by an order of magnitude.

ChrisC28 commented 7 months ago

Hi @BecCowley I've put a couple of CODA test outputs here: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test/2010

The files have the metadata above that should hopefully help the duplicate checker work.

The -127 WODFlag values still appear and I've traced these back to the original data files.

Could have you run your sceptical eye over these files and let me know if they are fit for purpose? I can now regenerate the files quickly, so fixes should be pretty easy to implement

BecCowley commented 6 months ago

@ChrisC28 Some comments:

I will do some testing on the data itself and let you know what I find.

BecCowley commented 6 months ago

@ChrisC28 some comments/queries on the data:

ChrisC28 commented 6 months ago

@BecCowley

I'm moving the WOD_2018 over to: /oa-decadal-climate/work/observations/CARSv2_ancillary/WOD2018 I've got ctd data from 1970 onwards, and argo, xbt, glider, etc... from 2005 onwards

ChrisC28 commented 6 months ago

I've pushed my changes to wodpy to my fork here: https://github.com/ChrisC28/wodpy

Being lazy, the original code is left in the file but commented out

ChrisC28 commented 6 months ago

Fixed issue:

Working through the remainder of @BecCowley 's list

ChrisC28 commented 5 months ago

@BecCowley

there are some missing time data. Is this representative of what is in the original file, or an issue with the conversion? Eg, the WOD_CODA_2010_xbt_Temperaturetest.nc file is missing date/time info in 44 profiles, including locations:688, 715, 895, 1046, 1517 ( WOD unique ids: 13047310, 13047318, 13047370, 12321440)

Found the issue - wodpync creates a datetime output based on the "date" variable and the "GMT_time" variable. The later is occasionally missing. Wodpy tests for this, but when changing from masked to regular python arrays, the test failed. I've reverse engineered the test with a bit of a hack but it seems to catch those cases.... when GMT time is missing, it takes the time as midnight (as in the original wodpy).

ChrisC28 commented 5 months ago

Hi @BecCowley

A new batch of CODA files to check. I've placed them here /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test/2010 the files are:

were is a 3 character code identifying the original data (WOD, MNF, etc...), obs platform is the three character code identifying the observation type (CTD, pfl, ....) YYYYMMDD is the date of the observation and is a 4 character datastring that counts the number of profiles on that date (first observation, 2nd oversation, etc...).

BecCowley commented 5 months ago

@ChrisC28 the WOD_unique id contains latitude, needs correcting as discussed.

ChrisC28 commented 5 months ago

Fixed... regenerating the WOD derived CODA files

ChrisC28 commented 5 months ago

Added the first batch of MNF -> CODA files:

Note that, as discussed, I haven't included a lot of the meta-data (things like COUNTRY, etc....). Probably worth discussing what we need for the QC/duplicate checking and making sure that we include what's required.

Next step: repeat with the AIMS data.

BecCowley commented 5 months ago

@ChrisC28 I think we need to add these to all the files at the time of conversion to CODA format:

We will absolutely need the following information for XBT files:

For profiling floats and glider files (when we get there):

The WOD code tables are available here and we should use these values if possible https://www.ncei.noaa.gov/access/world-ocean-database/wod-codes.html

Can you finagle that? Happy to help.

Also still need the CODA WOD files updated to fix the wod_unique_id issue. And, we need to make sure the longitudes of any datasets are in -180:180 degree format, not 360 degrees.

ChrisC28 commented 5 months ago

@BecCowley

I've fixed a few bugs and placed the newly created files in a new directory: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test_v2/ I've had a quick look and most of the supporting variables seem to be included. I've have't yet included the WOD Codes, and I think we need a brief discussion as to how to carry these over to the MNF/CSIRO and AIMS (+ other) data.

ChrisC28 commented 5 months ago

@BecCowley

Another one for your to-do list: I've put the first pass test of AIMS->CODA files. They are together with the files I've produced from WOD and MNF in the directory: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test_v2/ divided up by year.

Couple of things to note:

Please have a quick squizz when you get a moment. Could we catch up briefly about what exactly is required from the IQuOD duplicate checker/QC-er.... ?

BecCowley commented 5 months ago

@BecCowley

I've fixed a few bugs and placed the newly created files in a new directory: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODA_test_v2/ I've had a quick look and most of the supporting variables seem to be included. I've have't yet included the WOD Codes, and I think we need a brief discussion as to how to carry these over to the MNF/CSIRO and AIMS (+ other) data.

@ChrisC28 the attributes in the variables for the AIMS files have an issue (there is a long string in there - ncdump it to see). Will come see you to discuss how to put in the variables.

ChrisC28 commented 3 months ago

Hi @BecCowley WOD, MNF, AIMS and RAN data has now been CODA-fyied!

Path for the new test dataset is: /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/WOD_CODA_test_v2/

I'll push the converters next week. Would be good to try to refactor the code into a common set of functions (the MNF and RAN code is very similiar).

Note: I found that nutrient etc.... profiles are actually in the ocean station files and not, as I suspected, in the ctd files in WOD (there are also a few in the profiling float data). I had been ignoring ocean stations, but turns out that they are important.

BecCowley commented 3 months ago

@ChrisC28, @Thomas-Moore-Creative, I see the WOD files are only there from 2000 to 2017. I think Thomas was going to download the latest WOD from 2000 to now - has this been done yet and can we then complete the conversion?

Thomas-Moore-Creative commented 3 months ago

@Thomas-Moore-Creative - has this been done yet and can we then complete the conversion?

It has not, apologies. I'll start this now.

Thomas-Moore-Creative commented 3 months ago

I'll do this over in #19

ChrisC28 commented 3 months ago

Hi all,

In my haste to get this out on a Friday evening, I neglected to mention that I've downloaded WOD2018 from the OpenDAP server.

It turned out to be very easy (took me less than 30 minutes): /oa-decadal-climate/work/observations/WOD2018

Thomas-Moore-Creative commented 3 months ago

@BecCowley - given the above diligence from @ChrisC28 I assume that is as much as we can grab for now from WOD? CleanShot 2024-05-15 at 11 54 47@2x

.... I note it goes up to 2022 drwxrwsr-x 2 cha674 1109763 4.0K May 11 07:18 2022

ChrisC28 commented 3 months ago

Have a look at the notebook here: https://github.com/CARSv2/cars-v2/blob/main/notebooks/Download_WOD_from_OpenDap.ipynb

I've only downloaded a subset of WOD2018. However, it might be worth having nearly the whole thing? Not sure how useful data from Captain Cook might be, but you never know....

BecCowley commented 3 months ago

Yes, looks like it's downloaded. Thanks for doing this @ChrisC28 However, not all is translated to CODA, we can do that now the tools are there!

ChrisC28 commented 3 months ago

@BecCowley I'm running the script now! Converting a bunch of other variables (nutrients, CO2, etc...).

BecCowley commented 1 week ago

@ChrisC28 here are a list of format issues in the CODAv1 files:

  1. There are -ve depths in the MNF and RAN files. Maybe in others? WOD is ok.
  2. Can we remove the '2018' in the WOD filenames and the CODA id? I'd like the following naming format: NNN_CODA_yyyy_ttt.nc where NNN = dataset, yyyy = year, ttt = lowercase datatype.
  3. In the MNF* files, the CODA id has lower-case 'mnf' while in the WOD files, it is upper case. Can we be consistent in the CODA id format with case.