gher-uliege / DIVAnd.jl

DIVAnd performs an n-dimensional variational analysis of arbitrarily located observations
GNU General Public License v2.0
70 stars 11 forks source link

adapt ODV files "load" function for WOD data[šŸ‘†] #136

Closed qcazenave closed 1 year ago

qcazenave commented 1 year ago

Dear DIVAnd developers,

I am looking for duplicates between EMODNet Chemistry and the WOD. To do so, I use the DIVAnd.Quadtrees.checkduplicates function as a first guess but then refine the duplicate detection focusing on consistent profiles instead of looking at each observation as if it were independent from the others. For this, I decided to use the information given by the "id" output variable of the function "load" (from NCODV and/or from PhysOcean.WorldOceanDatabase) but there is a problem with the measurements from the WOD : they do not have information for the "LOCAL_CDI_ID" metadata. The "id" output variable is thus an empty variable. Would it be possible to add an option in the "load" function so that if the "LOCAL_CDI_ID" is not available, then another parameter can be used (like the "Station" parameter for example, in the case of the WOD) ? I created locally a development package for DIVAnd to adapt the NCODV.load function (see code below). I was wondering whether it would be possible to modify the "official" packages accordingly (or maybe in a smarter way) ?

Thanks Quitterie

Following is the way I adapted the function:

function load_WOD(T, fname, long_name; qv_flags = ["good_value", "probably_good_value"], nchunk = 10)

accepted_status_flags = qv_flags

Dataset(fname) do ds
    nstations = Int(ds.dim["N_STATIONS"])
    LOCAL_CDI_ID = fill("",nstations)

    if length(varbyattrib(ds, long_name = "LOCAL_CDI_ID")) == 0
        @warn "No variable with the long_name attribute \'LOCAL_CDI_ID\' in $fname found. We use the station name instead."

        ncvar_LOCAL_CDI_ID = varbyattrib_first(ds, long_name = "Station")

        if ndims(ncvar_LOCAL_CDI_ID) == 2
            LOCAL_CDI_ID = chararray2strings(ncvar_LOCAL_CDI_ID.var[:])
        else
            @warn """The variable with the long_name attribute \'Station\' is expected to have two dimensions. For example the output of 'ncdump -h' of $fname should contain:

[...] char metavar4(N_STATIONS, STRING20) ; metavar4:long_name = "Station" ; [...]

We use the empty string for LOCAL_CDI_ID instead. """ end

    else
        ncvar_LOCAL_CDI_ID = varbyattrib_first(ds, long_name = "LOCAL_CDI_ID")

        if ndims(ncvar_LOCAL_CDI_ID) == 2
            LOCAL_CDI_ID = chararray2strings(ncvar_LOCAL_CDI_ID.var[:])
        else
            @warn """The variable with the long_name attribute \'LOCAL_CDI_ID\' is expected to have two dimensions. For example the output of 'ncdump -h' of $fname should contain:

[...] char metavar4(N_STATIONS, STRING36) ; metavar4:long_name = "LOCAL_CDI_ID" ; [...]

We use the station name instead. """ ncvar_LOCAL_CDI_ID = varbyattrib_first(ds, long_name = "Station") if ndims(ncvar_LOCAL_CDI_ID) == 2 LOCAL_CDI_ID = chararray2strings(ncvar_LOCAL_CDI_ID.var[:]) else @warn """The variable with the long_name attribute \'Station\' is expected to have two dimensions. For example the output of 'ncdump -h' of $fname should contain: [...] char metavar4(N_STATIONS, STRING20) ; metavar4:long_name = "Station" ; [...]

We use the empty string for LOCAL_CDI_ID instead.
    """
            end
        end
    end

    EDMO_CODE = if length(varbyattrib(ds; long_name = "EDMO_code")) > 0
        varbyattrib_first(ds, long_name = "EDMO_code")[:]
    else
        varbyattrib_first(ds, long_name = "EDMO_CODE")[:]
    end

    obsproflon = varbyattrib_first(ds, standard_name = "longitude")[:]
    obsproflat = varbyattrib_first(ds, standard_name = "latitude")[:]

    # time for time series
    ncvar_time = nothing
    ncv_ancillary_time = nothing
    fillval_time = nothing
    accepted_status_flag_values_time = nothing
    vars_time_ISO8601 = varbyattrib(ds, long_name = "time_ISO8601")

    if length(vars_time_ISO8601) == 1
        # time series
        ncvar_time = vars_time_ISO8601[1]
        @assert ndims(ncvar_time) == 2
    else
        # profile
        obsproftime = varbyattrib_first(ds, standard_name = "time")[:]
        @assert ndims(obsproftime) == 1
    end

    ncvar = varbyattrib_first(ds, long_name = long_name)
    ncvar_z = varbyattrib_first(ds, long_name = "Depth")

    @debug "variable: $(name(ncvar))"
    @debug "variable z: $(name(ncvar_z))"

    ncv_ancillary, accepted_status_flag_values = statusflags(ncvar,accepted_status_flags)
    ncv_ancillary_z, accepted_status_flag_values_z = statusflags(ncvar_z,accepted_status_flags)

    if ncvar_time !== nothing
        ncv_ancillary_time, accepted_status_flag_values_time = statusflags(ncvar_time,accepted_status_flags)
        fillval_time = get(ncvar_time.attrib, "_FillValue", nothing)
    end

    fillval = ncvar.attrib["_FillValue"]
    fillval_z = get(ncvar_z.attrib, "_FillValue", nothing)

    data, data_z, data_time = loadprof(
        ncvar.var,
        ncv_ancillary,
        fillval,
        accepted_status_flag_values,

        ncvar_z.var,
        ncv_ancillary_z,
        fillval_z,
        accepted_status_flag_values_z,

        #= these 4 variables are nothing for profiles =#
        (ncvar_time == nothing ? nothing : ncvar_time.var),
        ncv_ancillary_time,
        fillval_time,
        accepted_status_flag_values_time,
        nchunk = nchunk
    )

    if ncvar_time !== nothing
        time_units = ncvar_time.attrib["units"]
        @debug "time_units: $time_units (decode as fractional years)"
        @assert time_units == "years since 0000-01-01"
        obsproftime = decode_odv_years.(data_time,fillval_time)
    end

    obsvalue, obslon, obslat, obsdepth, obstime, obsids = flatten_data(
        T,
        obsproflon,
        obsproflat,
        obsproftime,
        EDMO_CODE,
        LOCAL_CDI_ID,
        data,
        data_z,
    )
    return obsvalue, obslon, obslat, obsdepth, obstime, obsids
end

end

ctroupin commented 1 year ago

Hello Quitterie,

I haven't read (yet) the solution you proposed in details (due to other constrains right now), however when you say

: they do not have information for the "LOCAL_CDI_ID" metadata. The "id" output variable is thus an empty variable.

we can in fact get the ids for the WOD observations if we use:

@time obsvalwod,obslonwod,obslatwod,obsdepthwod,obstimewod,obsidwod = 
WorldOceanDatabase.load(Float64,woddatadir,varname,prefixid = "1977-");

where prefixid is used to set the EDMO code of the U.S. NODC. This gives this type of output:

julia> obsid
2046877-element Vector{String}:
 "1977-wod_007663161O"
 "1977-wod_007663161O"
 ā‹®
 "1977-wod_018508837O"
 "1977-wod_018508837O"

and the function WorldOceanDatabase.load is part of the PhysOcean module.

qcazenave commented 1 year ago

Thank you, Charles, for your answer. So I probably did something wrong with the WOD dataset because when I use the WorldOceanDatabase.load function, I end up with empty arrays. And these warnings: '[ Info: Loading files from C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\CTD\ocldb1692801986.13926.CTD.nc [ Info: Loading files from C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\GLD\ocldb1692801986.13926.GLD.nc [ Info: Loading files from C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\OSD\ocldb1692801986.13926.OSD.nc ā”Œ Warning: File C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\OSD\wod_020018588O.nc does not exist ā”” @ PhysOcean.WorldOceanDatabase C:\Users\qcazenav.julia\packages\PhysOcean\zhgOL\src\WorldOceanDatabase.jl:453 ā”Œ Warning: File C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\OSD\wod_020018590O.nc does not exist ā”” @ PhysOcean.WorldOceanDatabase C:\Users\qcazenav.julia\packages\PhysOcean\zhgOL\src\WorldOceanDatabase.jl:453 ā”Œ Warning: File C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\OSD\wod_020018592O.nc does not exist ā”” @ PhysOcean.WorldOceanDatabase C:\Users\qcazenav.julia\packages\PhysOcean\zhgOL\src\WorldOceanDatabase.jl:453 ā”Œ Warning: File C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\OSD\wod_020018594O.nc does not exist ā”” @ PhysOcean.WorldOceanDatabase C:\Users\qcazenav.julia\packages\PhysOcean\zhgOL\src\WorldOceanDatabase.jl:453 ā”Œ Warning: File C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\OSD\wod_020018596O.nc does not exist ā”” @ PhysOcean.WorldOceanDatabase C:\Users\qcazenav.julia\packages\PhysOcean\zhgOL\src\WorldOceanDatabase.jl:453 ā”Œ Warning: File C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\OSD\wod_020018598O.nc does not exist ā”” @ PhysOcean.WorldOceanDatabase C:\Users\qcazenav.julia\packages\PhysOcean\zhgOL\src\WorldOceanDatabase.jl:453 ā”Œ Warning: File C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\OSD\wod_020018602O.nc does not exist ā”” @ PhysOcean.WorldOceanDatabase C:\Users\qcazenav.julia\packages\PhysOcean\zhgOL\src\WorldOceanDatabase.jl:453 [ Info: Loading files from C:\Users\qcazenav\Documents\WOD_NE_Atlantic_Ocean\2022\WOD_raws\PFL\ocldb1692801986.13926.PFL.nc ' The said missing files are indead missing but there are a lot more so I do not really understand why the output is empty...

ctroupin commented 1 year ago

yes indeed, strange it's empty.

Just in case: have you followed this?

Load a list profiles under the directory basedir assuming basedir was populated by WorldOceanDatabase.download. If the prefixid is specified, then the observations identifier are prefixed with prefixid.

I must admit I haven't used this function a lot but in the notebooks from the DIVA-workshops project, the 90-full-analysis.ipynbprovides a good example.

qcazenave commented 1 year ago

Hi, I found the reason why it didn't work, it was a mistake from my side (not using the correct variable name) so we can close this issue ! Thanks again for your time.

ctroupin commented 1 year ago

Thanks, good it was too difficult to solve ;)