ArgoCanada / medsrtqc

An experimental container for Argo Canada realtime quality control code
https://argocanada.github.io/medsrtqc/
MIT License
0 stars 1 forks source link

`update_trace()` not capturing changes to `adjusted` and `adjusted_qc` fields #13

Closed cgrdn closed 2 years ago

cgrdn commented 2 years ago

This is in the context of running/testing the ChlaTest() component. After running this test script, the Trace() variable chla has filled in values for adjusted and adjusted_qc. However after running self.update_trace('FLU1', chla) at the end of the test, these fields remain "empty" (masked arrays) without the new values.

This appears to be specific to these fields, as the value and qc fields are both changed properly.

Some relevant output:

>>> chla

Trace(
    value=[0.2774, 0.2847, 0.2774, [1288 values], 0.0365, 0.0292, 0.0365],
    qc=[b'3', b'3', b'3', [1288 values], b'3', b'3', b'3'],
    adjusted=[0.67160004, 0.67160004, 0.67160004, [1288 values], 0.00365, 0.0, 0.00365],
    adjusted_qc=[b'1', b'1', b'1', [1288 values], b'1', b'1', b'1'],
    adjusted_error=[masked, masked, masked, [1288 values], masked, masked, masked],
    pres=[-0.1, -1.7014118e+38, -1.7014118e+38, [1288 values], 979.4, 979.4, 979.3],
    mtime=[masked, masked, masked, [1288 values], masked, masked, masked]
)

chlorophyll BEFORE testing, see log file here for more detail

>>> prof["FLU1"] 

Trace(
    value=[0.2774, 0.2847, 0.2774, [1288 values], 0.0365, 0.0292, 0.0365],
    qc=[b'0', b'0', b'0', [1288 values], b'0', b'0', b'0'],
    adjusted=[masked, masked, masked, [1288 values], masked, masked, masked],
    adjusted_qc=[masked, masked, masked, [1288 values], masked, masked, masked],
    adjusted_error=[masked, masked, masked, [1288 values], masked, masked, masked],
    pres=[-0.1, -1.7014118e+38, -1.7014118e+38, [1288 values], 979.4, 979.4, 979.3],
    mtime=[masked, masked, masked, [1288 values], masked, masked, masked]
)

chlorophyll AFTER testing

>>> prof['FLU1'] # same as printing it out during the test following self.update_trace('FLU1', chla)

Trace(
    value=[0.2774, 0.2847, 0.2774, [1288 values], 0.0365, 0.0292, 0.0365],
    qc=[b'3', b'3', b'3', [1288 values], b'3', b'3', b'3'],
    adjusted=[masked, masked, masked, [1288 values], masked, masked, masked],
    adjusted_qc=[masked, masked, masked, [1288 values], masked, masked, masked],
    adjusted_error=[masked, masked, masked, [1288 values], masked, masked, masked],
    pres=[-0.1, -1.7014118e+38, -1.7014118e+38, [1288 values], 979.4, 979.4, 979.3],
    mtime=[masked, masked, masked, [1288 values], masked, masked, masked]
)

@paleolimbot if you have any quick suggestions I would love to hear them!! But also this is not your problem anymore so don't spend too much time on it just thought I would check with you :)

cgrdn commented 2 years ago

More relevant info for what is in the masked array:

>>> prof['FLU1'].adjusted.data

array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)

>>> prof['FLU1'].adjusted_qc.data

array([b'', b'', b'', ..., b'', b'', b''], dtype='|S1')
paleolimbot commented 2 years ago

Hi! I'm glad this is getting used!

I'm happy to help! Without a smaller example to copy/paste, my best guess is that I never implemented adjusted stuff in the VMSProfile:

https://github.com/cgrdn/medsrtqc/blob/9d38d1981f18f2686cc4fb3f1fc193178bd31229/medsrtqc/vms/core_impl.py#L87-L138

cgrdn commented 2 years ago

Thanks @paleolimbot, sorry for the long delay, got busy with some other things.

So if I understand correctly, the key section is in https://github.com/cgrdn/medsrtqc/blob/9d38d1981f18f2686cc4fb3f1fc193178bd31229/medsrtqc/vms/core_impl.py#L125-L138, specifically line 132 because the only field that is being processed/looped through is v.qc.

Now the field that is being updated (m["Q_PARAM"]) is field within the VMSProfile, yes? And if yes, then my question is where can I find out which VMSProfile fields correspond to Trace.adjusted and Trace.adjusted_qc?

If what I have above is right, I'm not sure this will be necessary anymore, but I did write up a minimal working example (you'll need to use my branch to run it):

from medsrtqc.vms import read_vms_profiles
from medsrtqc.resources import resource_path
from medsrtqc.qc.chla import ChlaTest

# load a bgc VMS file (source files from coriolis)
profs = read_vms_profiles(resource_path('bgc_vms.dat'))
prof = profs[0]
test = ChlaTest()

# I edited test.run to return the Trace() for ease here
chla = test.run(prof)

# show trace values where chla.adjusted and chla.adjusted_qc have been updated
print('chla Trace:')
print(chla)

# show that those fields have not been updated in prof
print('"FLU1" Profile field:')
print(prof['FLU1'])

Which for me gives the following (there is more output from test.run() but these are the relevant bits):

chla Trace:
Trace(
    value=[0.2774, 0.2847, 0.2774, [1288 values], 0.0365, 0.0292, 0.0365],
    qc=[b'3', b'3', b'3', [1288 values], b'3', b'3', b'3'],
    adjusted=[0.67160004, 0.67160004, 0.67160004, [1288 values], 0.00365, 0.0, 0.00365],
    adjusted_qc=[b'1', b'1', b'1', [1288 values], b'1', b'1', b'1'],
    adjusted_error=[masked, masked, masked, [1288 values], masked, masked, masked],
    pres=[-0.1, -1.7014118e+38, -1.7014118e+38, [1288 values], 979.4, 979.4, 979.3],
    mtime=[masked, masked, masked, [1288 values], masked, masked, masked]
)
"FLU1" Profile field:
Trace(
    value=[0.2774, 0.2847, 0.2774, [1288 values], 0.0365, 0.0292, 0.0365],
    qc=[b'3', b'3', b'3', [1288 values], b'3', b'3', b'3'],
    adjusted=[masked, masked, masked, [1288 values], masked, masked, masked],
    adjusted_qc=[masked, masked, masked, [1288 values], masked, masked, masked],
    adjusted_error=[masked, masked, masked, [1288 values], masked, masked, masked],
    pres=[-0.1, -1.7014118e+38, -1.7014118e+38, [1288 values], 979.4, 979.4, 979.3],
    mtime=[masked, masked, masked, [1288 values], masked, masked, masked]
)
paleolimbot commented 2 years ago

Totally get the 'getting busy with other stuff' bit!

I actually think the best thing here would be for us to meet and we can debug it together since that might help the next time you run into a mistake I made when writing this! I think you have my gmail...Monday is particularly good for me if you can make that work.

cgrdn commented 2 years ago

Putting some notes from our call here:

cgrdn commented 2 years ago

And some communication with Anh on the subject:

Hi Anh,

[...]

One of the challenges I described at our last Argo Canada meeting was that I was having trouble getting the CHLA_ADJUSTED and CHLA_ADJUSTED_QC fields saved to a Profile() object in the python package and therefore to the VMS .dat file. If I understand correctly, there isn’t really an appropriate place to store the adjusted values and flags in the VMS file. So if that is the case I think there are two options:

  1. Since I am already calculating CHLA_ADJUSTED and CHLA_ADJUSTED_QC, we can just store those in a temporary text file, and instead of putting them in the VMS file, have a short step in the processing chain to put them in the netCDF file after is has been created. I can write this up.
  2. Add a field to be VMS file for the adjusted values (“FLUA”? “FLU7”?).

I don’t know how complicated it is to manipulate the structure of the VMS file so I tend to lean towards option 1 but let me know what you think!

Thanks, Chris

cgrdn commented 2 years ago

For now, I plan to save it to a temporary text file and get it later. If this changes I will re-open the issues.

cgrdn commented 2 years ago

After speaking w/ Anh today she suggested that since it is a whole profile of new values, we should create a new VMS file variable. We will simply decide on a name for it, and it will already exist by the time it gets to the point in the processing chain. Will need to do some careful updates to the code since a Trace() object contains an adjusted field, but we will using the values in the adjusted field to update the value of a new variable name value field.

My initial instinct is to do something like this to update the usual value field with the adjusted numbers:

self.update_trace("FLU1", chla)
self.update_trace("FLU7", chla, use_adjusted=True)
cgrdn commented 2 years ago

Somewhat implemented this in 028cfce. @paleolimbot I am a bit out of my depth in terms of understanding how __setitem__(self, k, v) (what is updated in the mentioned commit) gets used by update_trace(), and therefore am unsure how I would pass use_adjusted=True in the ChlaTest() code.

My attempt at understanding:

update_trace() will use the version of __setitem__() defined in core_impl.py when called. But since this is implicit (not sure if that term is appropriate here, abstract?) I can't necessarily pass use_adjusted=True in update_trace() terribly easily. Would implementing something like defining __setitem__(self, k, v, **kwargs) work here for it to collect anything else update_trace() has?

I'll try to figure this out on my own but any direction is appreciated!

paleolimbot commented 2 years ago

I hope this helps!

First, I think the term that applies here is "built-in"...the __setitem__ syntax is mapped to item[key] = value and there's nothing you can do to pass extra parameters there.

Stepping back from __setitem__, what exactly are you trying to do? I don't think what you actually want to do is pass extra parameters to __setitem__ (you probably want to do something like self["adjusted parameter name"] =Trace(pres, value = this_trace.adjusted_value, qc = this_trace.adjusted_qc)`, if that makes any sense)

cgrdn commented 2 years ago

Right that makes sense. There's no requirement to use update_trace() and therefore __setitem__. Can just do something like:

self.update_trace('FLU1', chla)
self.profile['FLU7'].value = chla.adjusted
self.profile['FLU7'].qc =  chla.adjusted_qc

Thanks Dewey!

cgrdn commented 2 years ago

Related, this test currently failing in my fork

    def test_vms_profile_update_adjusted(self):
        # load a bgc VMS file (source files from coriolis)
        profs = read.read_vms_profiles(resource_path('bgc_vms.dat'))
        prof = profs[0]

        chla_trace = prof["FLU1"]

        chla_trace.qc[0] = b'5'
        chla_trace.adjusted = chla_trace.value

        prof["FLU1"].adjusted = chla_trace.adjusted
        chla_trace_updated = prof["FLU1"]
        self.assertTrue(np.all(chla_trace_updated.adjusted == chla_trace.value))
cgrdn commented 2 years ago

For now, where there is nowhere to put adjusted updates in the file except for a new variable (which the warning directs you to), I am closing this issue.