NOAA-PMEL / Ferret

The Ferret program from NOAA/PMEL
https://ferret.pmel.noaa.gov/Ferret/
The Unlicense
55 stars 20 forks source link

identical descriptor files are incorrectly associated with different time axes #1984

Open AndrewWittenberg opened 3 years ago

AndrewWittenberg commented 3 years ago

This bug is crashing our automated analyses at GFDL, so I'm requesting the highest priority. It's probably worth reaching out to Ansley for initial clues, as she may have done some recent work on this.

Setup: create two NetCDF files, and then wrap them in descriptors using make_des:

    NOAA/PMEL TMAP
    FERRET v7.6 (optimized)
    Linux 2.6.32-754.29.2.el6.x86_64 64-bit - 06/25/20
    13-Nov-20 13:09     

yes? use monthly_navy_winds
yes? set reg/y=0/x=140w
yes? save/clob/file=uwnd.nc uwnd
yes? save/clob/file=vwnd.nc vwnd
yes? sp make_des uwnd.nc > uwnd.des
yes? sp make_des vwnd.nc > vwnd.des

The resulting descriptors point to NetCDF files with identical grids, so they ought to appear on the same grid (and axes) when we use them in Ferret.

But the new Ferret v7.6 puts them on different time axes (TIME1 and TIME2):

    NOAA/PMEL TMAP
    FERRET v7.6 (optimized)
    Linux 2.6.32-754.29.2.el6.x86_64 64-bit - 06/25/20
    13-Nov-20 13:17     

yes? use uwnd.des; sho grid uwnd[d=1]
    GRID GQV1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full
yes? use vwnd.des; sho grid vwnd[d=2]
    GRID GMO1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME2     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full

The previous version (Ferret v7.52, dated 12/03/2019) showed inconsistent behavior, depending on the order of commands:

    NOAA/PMEL TMAP
    FERRET v7.52 (optimized)
    Linux 2.6.32-754.23.1.el6.x86_64 64-bit - 12/03/19
    13-Nov-20 13:22     

yes? use uwnd.des; sho grid uwnd[d=1]
    GRID GQV1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full
yes? use vwnd.des; sho grid vwnd[d=2]
    GRID GMO1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME2     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full

That fails (giving TIME1 & TIME2), but this works (both grids on TIME2):

    NOAA/PMEL TMAP
    FERRET v7.52 (optimized)
    Linux 2.6.32-754.23.1.el6.x86_64 64-bit - 12/03/19
    13-Nov-20 13:23     

yes? use uwnd.des; use vwnd.des
yes? sho grid uwnd[d=1]; sho grid vwnd[d=2]
    GRID GMO1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME2     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full
    GRID GMO1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME2     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full

For both command orderings to produce identical grids (both on TIME1), we have to go all the way back to Ferret v7.3 (dated 12/04/2017). What's going on?

eugeneburger commented 3 years ago

Hello Andrew. Noted and we will be in touch with more questions.

ACManke commented 3 years ago

[editing to say the similarity may be with TSERIES on-the-fly aggregations, not FMRC aggregations.]

Sure, Andrew and Eugene. I have a thought or two and will get with Joshua to see if we can see what this is about. At a guess, the change in behavior with V7.6 comes from work having to do with time-axis handling for datasets defined as TSERIES aggregations, which share some of the descriptor-file handling machinery. If so it sounds like something has just had the effect of pointing out an existing bug and making it happen consistently.

seasage commented 3 years ago

Hi Andrew, A quick update: It seems like I'm nearing a solution. I'll do some more testing and follow up here.

seasage commented 3 years ago

Hi Andrew,

Thank you for your patience as I learn the ropes.

The issue here had to do with some memory allocation added along with the DSG functionality like @ACManke suggested. After patching it up, using your example, I get:

yes? use monthly_navy_winds
yes? set reg/x=140w/y=0
yes? save/clob/file=uwnd.nc uwnd
 LISTing to file uwnd.nc
yes? save/clob/file=vwnd.nc vwnd
 LISTing to file vwnd.nc
>./nc2mc.csh uwnd
>./nc2mc.csh vwnd
yes? use uwnd.mc; show grid uwnd[d=1]
    GRID GBB1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full

yes? use vwnd.mc; show grid vwnd[d=2]
    GRID GJC1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full

I also checked the fix against combining two netCDF files to ensure there is no redundancy in the time axis.

yes? use monthly_navy_winds
yes? set reg/x=140w/y=0
yes? save/clob/file=windfile_u1.nc/L=1:11 uwnd
 LISTing to file windfile_u1.nc
yes? save/clob/file=windfile_u2.nc/L=12:132 uwnd
 LISTing to file windfile_u2.nc
> ./nc2mc.csh windfile_u
yes? use windfile_u.mc
yes? show grid uwnd
    GRID GIH1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full
AndrewWittenberg commented 3 years ago

Thanks Joshua and Ansey, this fix seems to be working as advertised. But one loose end: why do the variables appear to be on two different grids? (These are named GBB1 and GJC1 in your example.) They should be on the same grid, since they share all the same axes. Is this simply an oversight in the new code, not recognizing that the grids are the same?

In my example above with the old Ferret v7.52, both variables appeared on the same grid, named GMO1 in that case. But with the new v7.63 I get:

    NOAA/PMEL TMAP
    FERRET v7.63 (optimized)
    Linux 2.6.32-754.33.1.el6.x86_64 64-bit - 01/04/21
     4-Jan-21 18:30     

yes? use uwnd.des; sho grid uwnd[d=1]
    GRID GQV1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full
yes? use vwnd.des; sho grid vwnd[d=2]
    GRID GMO1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full

where the two "apparently" (but not actually) different grids are here named GQV1 and GMO1.

AndrewWittenberg commented 3 years ago

Any progress on this loose end?

eugeneburger commented 3 years ago

Hello Andrew,

my apologies for this oversight - will focus on this tomorrow.

Eugene


Eugene F Burger 206.526.4586 Associate Director of IT, NOAA Pacific Marine Environmental Laboratory 7600 Sand Point Way NE Seattle WA 98115 https://www.pmel.noaa.gov

On Thu, Feb 4, 2021 at 4:37 PM AndrewWittenberg notifications@github.com wrote:

Any progress on this loose end?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NOAA-PMEL/Ferret/issues/1984#issuecomment-773698484, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5FHIVG63QH2KBOBHI7433S5M4TDANCNFSM4TU5IADA .

seasage commented 3 years ago

Hi Andrew,

What do you see when you instead try show grid uwnd[d=1] after use vwnd.des?

I get:

yes? use uwnd.mc; show grid uwnd[d=1]
    GRID GBB1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full
yes? use vwnd.mc; show grid vwnd[d=2]
    GRID GJC1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full
yes? show grid uwnd[d=1]
    GRID GJC1
 name       axis              # pts   start                end                 subset
 FNOCX81_81 LONGITUDE           1mr   140W                 140W                full
 FNOCY37_37 LATITUDE            1 r   0                    0                   full
 normal    Z
 TIME1     TIME               132 r   16-JAN-1982 20:00    17-DEC-1992 03:30   full

So they are after all on the same grid once Ferret reads both. The grid names are given by Ferret when the datasets are loaded, and should be unique to each grid. I'll keep looking into why this behavior is different across versions.

Josh

Edit: this bug occurs in v7.2 as well

AndrewWittenberg commented 3 years ago

Indeed you're right! I hadn't checked that. The grid of the first variable gets assigned the name of the new matching grid, rather than vice versa as I was expecting. So this is a much smaller issue that I had thought.

And as you said, I just tested this in earlier versions and it also seems to be the case in v7.52 and all the way back to v6.82 (August 2012) and probably earlier. So it's not a new bug.

I don't think any of my scripts use the grid names directly; instead they refer to the grid indirectly via the variable name, e.g. set grid uwnd[d=1] or let var2_on_var1 = var2[g=var1].

It seems a bit unusual to be changing the names of existing grids out from under variables that have already been loaded. Unless there's a compelling reason to do so, we might want to give the newer (matching) grid the pre-existing name, instead of giving the pre-existing grid the new name. I may dimly recall something Ansley might have said about this, but I'm not sure. Could drop her a quick note to ask.

But bottom line, I'd suggest resolving this issue and then opening a new "minor" and "low-priority" one with the above suggestion. Given that this is longstanding behavior with no obvious consequences, I'd only change it if it looked really easy, with no loose ends! If it's difficult, or opens up a can of worms, then just let it be. Thanks for your detective work!

ACManke commented 3 years ago

It's always so useful to be able to go back to older versions.

I think it'll be best to leave this one alone. The process of defining and checking grids for datasets has a number of steps. Grid information is kept on a linked list: get the next slot in the the list; set up the grid, filling in its axes; see if the grids are consistent for variables within the dataset that's being initialized; and finally check to see if they're identical to any existing grid. You'd think that if so that we'd use the previously defined grid, but instead the choice was made to use the new grid and assign the new grid to any variables using the old grid, and mark the old grid as unused. It may be that this was an arbitrary choice, but I think, without studying things, that it's a matter of not cleaning up old stuff but always moving forward in the list. The unused slot might be used again if needed, now that it's marked unused, but as a new element on the list. I believe the linked-list logic that's used wants to go forward not back-and-forth.

I don't know if that makes any sense, or even if it's accurate, but it's my intuition. Still might be worth a new ticket, just so there can be an issue description that's correct for this bit of it, but mark it as low priority, and note that it may be more trouble than it's worth.

AndrewWittenberg commented 3 years ago

Thanks Ansley for weighing in! That sounds good to me.