As a developer, I want to reduce the amount of logging output by the WRES when attempting to load NWM v3.0 from dstore which has only 6 members, not 7 - Githubissues

epag commented 3 months ago

Author Name: Hank (Hank) Original Redmine Issue: 121139, https://vlab.noaa.gov/redmine/issues/121139 Original Date: 2023-10-02

Original description is below. This ticket has been repurposed to reduce the amount of logging noticed when evaluating a NWM v3.0 MRF ensemble forecasts by read data from d-store. There is no member 7 for v3.0, so the d-store reader outputs one warning per member 7 file not found, which is many.

Hank

=======================

This can be resolved once the investigation is complete and any needed code changes implemented.

The 3.0 NWM only has 6 members, not 7. I don't fully understand the story behind it, but whatever. I want to understand what the WRES does when in uses ensemble forecasts for 3.0. I believe I tested it before, and I believe it worked, so I'm guessing there are warnings about the 7th member being missing, but the evaluation proceeds anyway.

I want to make sure that's correct and then we can decide if changes are recommended (probably not).

Hank

Related issue(s): #154 Redmine related issue(s): 124687

epag commented 3 months ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-10-02T19:43:30Z

Doing a d-store ensemble evaluation: 5936842873174531324 in production. I'm going to scan the log afterwards for any mention of "mem_7".

Hank

epag commented 3 months ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-10-02T20:01:15Z

Seeing lots of messages that look like this:

2023-10-02T19:46:37.216+0000 WARN NwmTimeSeries Found a partial NWM TimeSeries with reference datetime 2023-10-01T00:00:00Z and profile NwmProfile[blobCount=204,memberCount=7,durationBetweenValidDatetimes=PT1H,isVector=true,timeLabel=f,nwmConfiguration=medium_range,nwmOutputType=channel_rt,nwmSubdirectoryPrefix=medium_range,nwmLocationLabel=conus,isEnsembleLike=true,durationPastMidnight=PT0S,durationBetweenReferenceDatetimes=PT6H] from https://[dstore]/nwm/3.0.The following resources were not found: [https://[dstore]/nwm/3.0/nwm.20231001/medium_range_mem7/nwm.t00z.medium_range.channel_rt_7.f181.conus.nc, https://[dstore]/nwm/3.0/nwm.20231001/medium_range_mem7/nwm.t00z.medium_range.channel_rt_7.f189.conus.nc, ... SNIP ...

Any reference to "medium_range_mem7" is not found and every single file missing is listed. However, the evaluation still succeeds. I assume, therefore, that it sees the 6 members in NWM v3.0 and uses those appropriately, ignoring the fact that member 7 is not found. Good.

While we could make the WRES smart enough to not look for "medium_range_mem7" when reading from d-store NWM v3.0 data, I think that would be too much smarts to hardcode into the WRES. The one concern I have is that there are a lot of log messages talking about "medium_range_mem7" not being found, so stdout will be flooded with every single file that was not found every time it was looked for.

Would it be worthwhile to look into aggregating those log messages or otherwise shortening them? Perhaps just saying this:

The following resources were not found: https://[dstore]/nwm/3.0/nwm.20231001/medium_range_mem7

if its practical in-code instead of listing every file within that directory it was looking for?

Beyond the logging, I see no reason to change the current behavior. Thoughts?

Hank

epag commented 3 months ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-10-02T20:01:30Z

My workday is done. More tomorrow,

Hank

epag commented 3 months ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-10-12T17:38:04Z

Repurposing this ticket.

Generally agreed that having a single log message summarize the other messages, which are currently produced once per read attempt for a member 7 file, is the best approach.

This ticket can be resolved once those messages are demoted to debug and a single, summarizing message is output instead.

Removing myself as assignee since the investigation is complete. Thanks,

Hank

epag commented 3 months ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-10-24T19:00:46Z

Slipping to 6.17,

Hank

epag commented 3 months ago

Original Redmine Comment Author Name: Evan (Evan) Original Date: 2023-11-14T15:17:25Z

This didn't seem to have progress in 6.17

epag commented 3 months ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-12-15T16:06:55Z

We reduced d-store logging in general in #123199.

However, I think the solution we want for this is some means by which the number of members to look for is automatically determined, either by detecting the number of members (perhaps the code reaching out to see is there is a member 7 folder?), or by having two possible profiles map to the NWM @interface@ being used. Something like that.

Regardless, this won't get into 6.18. I'll slip it to 6.19 for further discussion,

Hank

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2024-01-03T12:48:38Z

I think you have a choice between two bad outcomes, which largely stems from a design choice that needs to be reconsidered:

Add a new set of profiles for the new data structure, each associated with a new interface shorthand;
Accept that the logging will be imperfect. In other words, you will either need to accept too much logging for the new structure or too little logging for the old structure when a member is missing, unexpectedly.

Overall, since logging is intended for developers, I would favor accepting the imperfect logging, probably by accepting the possibility that there may or may not be a 7th member and not warning when it is missing. I wouldn't try to code in some analysis of the source paths for the presence of version information, which may or may not be there.

Separately, we need to reconsider interface shortands as a way of policing a structural expectation on NWM data. Why is the NWM a special snowflake among our data sources? We don't police data structures from other models/sources, we simply read what we find and assemble it. I think the upside of policing the NWM structure is pretty small - basically a bunch of warnings when the expectation is not met - but the downside is quite large, namely a proliferation of interface shorthands to capture the vast number of dimensions used to describe each structure. If these structures are going to change between model versions, we'll end up with a completely unusable list of shorthands and that is arguably the case already. Perhaps I am forgetting some other reasons for the upfront identification of this structural expectation beyond policing missing data, but it should be possible to design something more adaptive that does not rely on a directory structure that is known upfront. There are other parts of our code-based where we assemble (e.g., ensemble) time-series from multiple sources. It should be possibly to do this ex-post, rather than correlating sources with time-series upfront.

Again, that bigger issue really needs a separate ticket.

epag commented 3 months ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-01-03T13:15:17Z

James:

Before we create that bigger ticket, I want to make sure I understand the implications of your comment. Specifically, if we do not "police" the data structure (which I assume also means interpreting the file names), then, presumably, the WRES will scan the directory/path to which it is pointed and examine all of the files beneath it in order to identify data to be ingested and ingest that data. Is that correct?

One upside to that policing is performance. By knowing the directory structure and file name format up front, we can ensure that files not within the issued/reference date range are not examined. For example, the first directory underneath the top-level @2.2@/@3.0@/whatever directory is the reference date; for example:

https://[D-Store]/nwm/3.0/nwm.20231204/

If we don't assume a directory structure under @3.0@, then we will end up looking at every @3.0@ archived netCDF file when scanning the directory structure for data to read, unless the user makes judicious use of file path/name pattern matching. This will not only include files outside of the period of interest, but also undesired data types. Yes, I understand that the WRES can just look at the netCDF headers to make that determination, but that is an awful lot of netCDF headers that will need to be examined.

Hank

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2024-01-03T13:23:20Z

We can exploit reference datetimes in directory names without an interface shorthand. Reference datetimes are not part of an interface shorthand. Rather, the reference datetimes are a common part of the nwm directory structure naming and will be present forevermore. In other words, I am not proposing to make no assumptions about nwm directory structures, although we can probably even detect those rather than assume them, rather I am proposing to avoid the brittle interface shorthands.

The problem with interface shorthands is that they abstract too many of the dimensions that change often, like the number of members. We should be able to take a declaration with (e.g., reference date) constraints and find the data we want, no problem, without requiring a user to declare an interface shorthand.

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2024-01-03T13:25:48Z

The other possibility is that we make the interface shorthands less brittle by removing all of the attributes that are likely to change between model versions, such as gaps between reference times and valid times and number of members and just detect/read what is there.

NOAA-OWP / wres

As a developer, I want to reduce the amount of logging output by the WRES when attempting to load NWM v3.0 from dstore which has only 6 members, not 7 #134