dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
78 stars 24 forks source link

Bugfix: Fix ASCII2NC to handle missing NDBC buoy location information #2426

Closed GwenChen-NOAA closed 1 year ago

GwenChen-NOAA commented 1 year ago

Describe the Problem

I tested ASCII2NC with MET 11.0.0/METplus 5.0.0 to convert NDBC buoy data (*.txt) into a METplus compatible point obs netCDF file and got the following error messages in the log file:

ERROR : No location information found for station 44084 do not process file /lfs/h1/ops/dev/dcom/20230124/validation_data/marine/buoy/44084.txt ERROR : No location information found for station 46275 do not process file /lfs/h1/ops/dev/dcom/20230124/validation_data/marine/buoy/46275.txt ERROR : No location information found for station CSXA2 do not process file /lfs/h1/ops/dev/dcom/20230124/validation_data/marine/buoy/CSXA2.txt ...

How were NDBC buoy location information obtained in MET/METplus? Please update the buoy location information to include all NDBC buoys.

Expected Behavior

All NDBC buoys have an assigned location.

Environment

Describe your runtime environment: 1. Machine: (e.g. HPC name, Linux Workstation, Mac Laptop) WCOSS2 and Hera 2. OS: (e.g. RedHat Linux, MacOS) 3. Software version number(s) MET 11.0.0/METplus 5.0.0

To Reproduce

NDBC buoy data directory on WCOSS2: /lfs/h1/ops/dev/dcom/$YYYYMMDD/validation_data/marine/buoy

METplus use case: https://metplus.readthedocs.io/en/latest/generated/model_applications/marine_and_cryosphere/PointStat_fcstGFS_obsNDBC_WaveHeight.html#sphx-glr-generated-model-applications-marine-and-cryosphere-pointstat-fcstgfs-obsndbc-waveheight-py

Relevant Deadlines

MET 11.0.1/METplus 5.0.1

Funding Source

2773542

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

Bugfix Checklist

See the METplus Workflow for details.

JohnHalleyGotway commented 1 year ago

Thanks @jprestop for pulling buoy data for testing! I see it in seneca:/d1/projects/MET/MET_issues/feature_2426

I ran ascii2nc with this data and was able to replicate the reported behavior:

bin/ascii2nc buoy/* buoy.nc -log run_ascii2nc.log

Produced these log messages:

ERROR  : No location information found for station 44084 do not process file buoy/44084.txt
ERROR  : No location information found for station 46275 do not process file buoy/46275.txt
ERROR  : No location information found for station CSXA2 do not process file buoy/CSXA2.txt
ERROR  : No location information found for station DILA1 do not process file buoy/DILA1.txt
ERROR  : No location information found for station KBMG1 do not process file buoy/KBMG1.txt
ERROR  : No location information found for station KCXA2 do not process file buoy/KCXA2.txt
ERROR  : No location information found for station KOZA2 do not process file buoy/KOZA2.txt
ERROR  : No location information found for station OWMO1 do not process file buoy/OWMO1.txt
ERROR  : No location information found for station PAUA2 do not process file buoy/PAUA2.txt
ERROR  : No location information found for station SGXA2 do not process file buoy/SGXA2.txt
ERROR  : No location information found for station WCRP1 do not process file buoy/WCRP1.txt
ERROR  : No location information found for station WGXA2 do not process file buoy/WGXA2.txt

Tasks:

  1. [x] Change these log messages from Error to Warning since they do NOT halt the execution of the program. Also tweak the wording for clarity.
  2. [x] Update contents of data/table_files/ndbc_stations.xml for the missing buoy locations.
    • Find locations at https://www.ndbc.noaa.gov/station_page.php?station=44084 for example.
  3. [x] Also test additional days looking for more missing buoys.

Note that this issue could be addressed by setting the MET_NDBC_STATIONS environment variable. However, we may as well update the table in MET to include all buoys used for operational verification.

JohnHalleyGotway commented 1 year ago

Note that https://www.ndbc.noaa.gov/to_station.shtml contains links for 1839 stations:

wget https://www.ndbc.noaa.gov/to_station.shtml
cat to_station.shtml | sed 's/</\n/g' | grep "station=" | cut -d'>' -f2 | sort -u > ndbc_stations.txt 
cat ndbc_stations.txt | wc -l

Currently, only 1364 are included in MET (as of v11.0.0):

cat data/table_files/ndbc_stations.xml | grep "station id" | wc -l

@GwenChen-NOAA and @davidalbo, should we update that file to include all 1839 locations?

If so, perhaps we only really care about lat/lon/elev/name and not all the other metadata?

davidalbo commented 1 year ago

@JohnHalleyGotway @GwenChen-NOAA - Originally I used this https://www.ndbc.noaa.gov/activestations.xml to create the ndbc_stations.xml. We are using it directly as is if I'm remembering. Also I don't remember who pointed me to it. I can't remember much apparently. :-)

If this other source is better we should use it instead. I believe it's true we only are pulling lat/lon/elev/name

We'd need to write up something to parse the newer source and create a new stations file.

GwenChen-NOAA commented 1 year ago

Deanna is checking with her colleague at NDBC to see how often they update the webpage.

Thanks, Gwen

JohnHalleyGotway commented 1 year ago

@davidalbo thanks for that link. That's very helpful!

@GwenChen-NOAA I've manually added 12 missing stations to the ndbc_stations.xml file in my bugfix branch. I got the name/lat/lon/elev values correct but made some educated guesses as to the other entries (which MET doesn't use anyway). And I confirmed that the updated list makes all those missing station log messages go away for the 2 days with which I tested.

So this increases the number of entries in the file from 1364 (as of v11.0.0) to 1376.

The question is whether or not those changes are sufficient or should we be using all 1839 stations described here? And with 2 different sources, there's potential that the details could differ.

The easiest thing would be sticking with the 12 additions I've already made, but I'd like your input.

Please let me know how you and Deanna would like to proceed.

JohnHalleyGotway commented 1 year ago

I'd like to wrap up this bugfix issue. Unless I hear otherwise, I'll proceed with the manual changes I made by adding the locations for the 12 missing buoys.

If @GwenChen-NOAA and Deanna do find a preferable or more complete way of defining NDBC Buoy locations, I recommend that they write up a GitHub issue to enhance the definition for a future release, such as MET-11.1.0. Do note, once you have the updated definition in hand you can start using it right away by setting the $MET_NDBC_STATIONS environment variable.

GwenChen-NOAA commented 1 year ago

Thank you, John! Deanna contacted NDBC, but hasn't heard back from them. I think adding the locations for the 12 missing buoys is sufficient for the bugfix. If possible, please reduce the log message for missing buoy location information from ERROR to WARNING, so NCO won't be alerted if it happens in the production run. ASCII2NC can still run with missing information and create a nc file, so it's not fatal.

Thanks, Gwen

GwenChen-NOAA commented 1 year ago

Hello all,

I just heard back from NDBC:

"The list at https://www.ndbc.noaa.gov/to_station.shtml includes all valid station IDs, both current and historical. The XML list ( https://www.ndbc.noaa.gov/activestations.xml) I believe is updated in real-time, and includes just the active stations, and not historical stations that have been disestablished.

Incidentally, almost all of the "missing" stations that are mentioned are present in the XML list. Except for DILA1 (which we never released until recently due to it being co-located with DPIA1) , they're all relatively new stations, so it's possible that the requester may be working off an older version of the file."

Best, Deanna

-- Deanna Spindler, PhD Physical Scientist III Lynker at NOAA/NWS/NCEP/Environmental Modeling Center NOAA Center for Weather and Climate Prediction

On Thu, Feb 2, 2023 at 6:44 PM L. Gwen Chen @.***> wrote:

Thank you, John! Deanna contacted NDBC, but hasn't heard back from them. I think adding the locations for the 12 missing buoys is sufficient for the bugfix. If possible, please reduce the log message for missing buoy location information from ERROR to WARNING, so NCO won't be alerted if it happens in the production run. ASCII2NC can still run with missing information and create a nc file, so it's not fatal.

Thanks, Gwen

On Thu, Feb 2, 2023 at 1:27 PM John Halley Gotway < @.***> wrote:

I'd like to wrap up this bugfix issue. Unless I hear otherwise, I'll proceed with the manual changes I made by adding the locations for the 12 missing buoys.

If @GwenChen-NOAA https://github.com/GwenChen-NOAA and Deanna do find a preferable or more complete way of defining NDBC Buoy locations, I recommend that they write up a GitHub issue to enhance the definition for a future release, such as MET-11.1.0. Do note, once you have the updated definition in hand you can start using it right away by setting the $MET_NDBC_STATIONS https://met.readthedocs.io/en/latest/Users_Guide/config_options.html#met-ndbc-stations environment variable.

— Reply to this email directly, view it on GitHub https://github.com/dtcenter/MET/issues/2426#issuecomment-1414181703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWXF3DAZ52J6PALTQBXLM6LWVP37XANCNFSM6AAAAAAUJA6SOI . You are receiving this because you were mentioned.Message ID: @.***>

JohnHalleyGotway commented 1 year ago

Reopening this issue that was automatically closed by the GitHub automation. While this is fixed in the main_v11.0 branch, @davidalbo is working on a better fix for the develop branch.

davidalbo commented 1 year ago

@JohnHalleyGotway I'm not seeing any way to easily get all the station location information past and present based on reading this: https://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf which does refer to the active stations we'd used before, but not to any complete list in one webpage.

Doing this: wget https://www.ndbc.noaa.gov/to_station.shtmldoes provide all the station i.d.s in a parsable format, past and present.

I was hoping I could easily go through each station and retrieve its information also in a parsable format, i.e. somehow parsing what one sees for each station, such as for station 41047, for example:

https://www.ndbc.noaa.gov/station_page.php?station=41047

Is there a way to parse what is seen here so as to pull out the information we want?
This is where I'm stuck.

JohnHalleyGotway commented 1 year ago

Dave, I was thinking you could do something like this:

  1. Loop over station id's and for each...
  2. Run wget (e.g.):
    wget https://www.ndbc.noaa.gov/station_page.php?station=41047
  3. Grep out the necessary info:
    egrep "var currentstn" station_page.php\?station\=41047 
    var currentstnid = '41047';
    var currentstnlat = 27.465;
    var currentstnlng = -71.452;
    var currentstnname = 'NE BAHAMAS - 350 NM ENE of Nassau, Bahamas';
    var currentstndata = 'y';

    The only important piece this does not include is the elevation...

    egrep "Site elevation" station_page.php\?station\=41047 
        <b>Site elevation:</b> sea level<br />

    Presumably, we'd want to interpret sea level as elevation = 0.

I'd recommend formatting this data using the same xml format used in: https://github.com/dtcenter/MET/blob/main_v11.0/data/table_files/ndbc_stations.xml

Here's a sample line:

<station id="00922" lat="30" lon="-90" name="OTN201 - 4800922" owner="Dalhousie University" pgm="IOOS Partners" type="other" met="n" currents="n" waterquality="n" dart="n"/>

But MET really only uses entries for the station id, lat, lon, elev.

Hopefully the code in MET that parses this data would still run fine even though the name, owner, pgm, type, met, currents, waterquality, and dart entries are not present. But that'd be up to you to test and fix.

If you do script up this processing, I think it'd be useful to save off that parsing script because chances are we may need to rerun it in the future.

davidalbo commented 1 year ago

That's exactly what I thought but can't get this to work:

wget https://www.ndbc.noaa.gov/station_page.php?station=41047

JohnHalleyGotway commented 1 year ago

@davidalbo, on seneca, I ran:

wget https://www.ndbc.noaa.gov/station_page.php?station=41047
egrep "var currentstn" station_page.php\?station\=41047

And see this output:

    var currentstnid = '41047';
    var currentstnlat = 27.465;
    var currentstnlng = -71.452;
    var currentstnname = 'NE BAHAMAS - 350 NM ENE of Nassau, Bahamas';
    var currentstndata = 'y';

So presumably you could pull this data there.

davidalbo commented 1 year ago

@JohnHalleyGotway Ah ha... I had to put quotes around the argument to get wget to work which I hadn't been doing, I guess the shell was probably not liking the '?' or something

Meanwhile I just wrote a little python script to parse the results of the original wget https://www.ndbc.noaa.gov/to_station.shtml output to pull out all the stations. Combining these two things should allow implementation of what we are after, which is a full list.

davidalbo commented 1 year ago

Some stations are in the active website list, but not in the full list, for example: 32303. Others are in the full list but not the active list. This complicates matters.

davidalbo commented 1 year ago

@JohnHalleyGotway @GwenChen-NOAA

I've been looking into automatically merging the station information from the two sources into one better table of station location information. This would be some script or program that could be run periodically to update our table file.

I've written some python to ingest and compare the info from the two sources:
Active: https://www.ndbc.noaa.gov/activestations.xml Complete: https://www.ndbc.noaa.gov/to_station.shtml

The question is, how to merge these. What would you suggest given the following observations:

As expected, many stations in 'Complete' are not in 'Active', (I count 1239 such stations) but also a lot of 'Active' that are not 'Complete': 834 stations to be exact.

Also, for stations that are both active and complete, 18 of them disagree on lat/lon values. Mostly small differences but a few that are significantly different, or for which the 'complete' has lat=lon=0 which is wrong: latlons disagree for 41040 active:( 15.4235 , -56.8072 ) complete:( 14.54 , -53.329 ) latlons disagree for 41108 active:( 33.722 , -78.016 ) complete:( 33.721 , -78.016 ) latlons disagree for 41110 active:( 34.142 , -77.715 ) complete:( 34.143 , -77.716 ) latlons disagree for 41114 active:( 27.551 , -80.217 ) complete:( 27.552 , -80.216 ) latlons disagree for 42001 active:( 25.942 , -89.657 ) complete:( 25.919 , -89.674 ) latlons disagree for 42409 active:( 25.901 , -89.291 ) complete:( 23.4163 , -90.5874 ) latlons disagree for 42501 active:( 26.03 , -89.56 ) complete:( 0.0 , 0.0 ) latlons disagree for 42503 active:( 27.91 , -87.94 ) complete:( 0.0 , 0.0 ) latlons disagree for 44088 active:( 36.611 , -74.841 ) complete:( 36.614 , -74.841 ) latlons disagree for 44097 active:( 40.967 , -71.126 ) complete:( 40.967 , -71.124 ) latlons disagree for 46011 active:( 34.937 , -121.0 ) complete:( 34.936 , -120.998 ) latlons disagree for 46041 active:( 47.353 , -124.742 ) complete:( 47.352 , -124.739 ) latlons disagree for 46098 active:( 44.381 , -124.956 ) complete:( 44.378 , -124.947 ) latlons disagree for 46099 active:( 46.986 , -124.566 ) complete:( 46.988 , -124.567 ) latlons disagree for 46100 active:( 46.851 , -124.972 ) complete:( 46.851 , -124.964 ) latlons disagree for 46416 active:( 49.901 , -134.395 ) complete:( 50.2782 , -133.458 ) latlons disagree for 46518 active:( 44.5 , -170.0 ) complete:( 0.0 , 0.0 ) latlons disagree for 51210 active:( 21.477 , -157.756 ) complete:( 21.477 , -157.757 )

JohnHalleyGotway commented 1 year ago

@davidalbo thanks for following up with these details. I had a sinking feeling that we might find these sorts of differences. For the (lat, lon) = (0, 0) cases, we'd obviously choose the non-(0, 0) location. But the others are much less obvious. Intuitively, I'd prefer the "active" location to the "complete" location, but that's just my guess. That approach would make the behavior of MET version 11.1.0 backward compatible with the behavior of version 11.0.1... which is definitely what we want to do!

@GwenChen-NOAA and @DeannaSpindler-NOAA, can you please check with your NDBC contacts, let them know the location discrepancies that @davidalbo has found, and confirm that we should proceed with showing preference for the active buoy locations over than the complete buoy location list? So use the existing active list (ndbc.noaa.gov/activestations.xml and then add to that buoy locations defined by ndbc.noaa.gov/to_station.shtml but ONLY FOR buoys not already present in the active list?

DeannaSpindler-NOAA commented 1 year ago

@JohnHalleyGotway, I've emailed NDBC and am waiting for their reply.

davidalbo commented 1 year ago

I do have python scripts nearly ready to go that add those complete stations that are not active to the list of active stations to make a full list locally, in a format that can be read by the existing ascii2nc software. My idea is this could be run as needed, as both the active and complete sources could change at any time. I can even have the script access the web via wget if that makes sense, to pull down the latest info, or that can be done manually and the scripts take it from there.

Assuming we go with something like this, the question would be how to put this into the MET environment as a standalone utility. What do you think @JohnHalleyGotway? Should this be something the community would do themselves, or would we NCAR do it upon request and update the official file in the table_files, or both? And where would this scripting belong in the MET environment?

JohnHalleyGotway commented 1 year ago

@davidalbo if we do want to provide this, I believe that scripts/utility would be a good home for it.

GwenChen-NOAA commented 1 year ago

John,

I saw you closed this bugfix. Deanna contacted NDBC about the discrepancies between the "Complete" and "Active" lists last week, but we haven't heard back from them. I will let you know once I hear from them.

Thanks, Gwen

On Tue, Feb 21, 2023 at 5:25 PM John Halley Gotway @.***> wrote:

Closed #2426 https://github.com/dtcenter/MET/issues/2426 as completed via 31a063c https://github.com/dtcenter/MET/commit/31a063c5834f8fa3f038b99d4732e51e16246462 .

— Reply to this email directly, view it on GitHub https://github.com/dtcenter/MET/issues/2426#event-8575456955, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWXF3DCNS3AXSI4R4ZLSCRTWYU6E7ANCNFSM6AAAAAAUJA6SOI . You are receiving this because you were mentioned.Message ID: @.***>

JohnHalleyGotway commented 1 year ago

Reopening since this has not yet been fixed in the develop branch.

davidalbo commented 1 year ago

Some more interesting information to ponder: I originally pulled down the active stations for use as our lookup list a long while back, months ago. That is now our official file (data/table_files/ndbc_stations.ndbc_stations.xml).

When I now pull down both the active and complete lists and compare that to this older file I get this:

Number of stations that are in default ndbc_stations.xml and are not now online: 592

This mean a lot of stations used to be in the active stations list online, and now are completely gone from both active and complete lists online.

A few examples:

Station in the local table file but no longer on the webpages: 00922
Station in the local table file but no longer on the webpages: 00923
Station in the local table file but no longer on the webpages: 01500
Station in the local table file but no longer on the webpages: 01502
Station in the local table file but no longer on the webpages: 01503

I checked by hand and indeed these are no longer online at either source: https://www.ndbc.noaa.gov/activestations.xml https://www.ndbc.noaa.gov/to_station.shtml

Complete info for these stations back when they existed, as seen in our ndbc_stations.xml file:

 <station id="00922" lat="30" lon="-90" name="OTN201 - 4800922" owner="Dalhousie University" pgm="IOOS Partners" type="other" met="n" currents="n" waterquality="n" dart="n"\
/>
  <station id="00923" lat="30" lon="-90" name="OTN200 - 4800923" owner="Dalhousie University" pgm="IOOS Partners" type="other" met="n" currents="n" waterquality="n" dart="n"\
/>
  <station id="01500" lat="30" lon="-90" name="SP031 - 3801500" owner="SCRIPPS" pgm="IOOS Partners" type="other" met="n" currents="n" waterquality="n" dart="n"/>
  <station id="01502" lat="30" lon="-90" name="Penobscot - 4801502" owner="University of Maine" pgm="IOOS Partners" type="other" met="n" currents="n" waterquality="n" dart="\
n"/>
  <station id="01503" lat="30" lon="-90" name="Saul - 4801503" owner="Woods Hole Oceanographic Institution" pgm="IOOS Partners" type="other" met="n" currents="n" waterqualit\
y="n" dart="n"/>

Since they all say lat=30, lon=-90 it suggests they were maybe not quite actual stations? I have not checked all 592 such stations in this way, but they are all in this file data/table_files/ndbc_stations.ndbc_stations.xml.

What this means (as the software designer) is that stations might appear, disappear, or move, namely anything could happen depending on how the web pages get updated. I'm sure there are good reasons for changes to the web content, but my design needs to handle all possibilities.

@GwenChen-NOAA Any scientific ideas about how to best proceed? I think we'd want to update our 'official' station file periodically using whatever python script I end up with, and it could be something an individual could do as well to create their own latest/greatest stations file, as needed. All I need to finish the python is a priority, i.e. active takes priority over complete, or vice versa, or make either option possible. Also, if a station is not online but used to be online, should it be removed, or kept.

davidalbo commented 1 year ago

The original number is too big, 592 (of stations that 'disappeared') Something wrong in my analysis. More like 50.

GwenChen-NOAA commented 1 year ago

Dave and John,

Below is the reply from NDBC:

_We are confused by where it's mentioned 834 stations that are in the activestations.xml file but not in https://www.ndbc.noaa.gov/to_station.shtml. The only stations I'm aware of that are in the activestations.xml file but not on the to_station.shtml page are the TAO stations, which have their own page (https://tao.ndbc.noaa.gov/). That being said, we have noticed new issues within the activestations.xml file that have been submitted in a ticket to our software group but should not be affecting the data you are looking for.

As for the position discrepancies, the positions on the NDBC station pages are the most up-to-date mooring positions. I'm not sure where the activestations.xml file positions are coming from. I saw at least one case where it is pulling the latest reported position from a drifter instead of the moored position. I'd say that the station pages are probably the most accurate.

The three stations that do not have positions on the station pages (42501, 42503, and 46518) are drifting platforms, so there is no moored position. When those stations report data, a position is included with each observation._

Based on what they said in the second paragraph, the positions in the "Complete" list (https://www.ndbc.noaa.gov/to_station.shtml) take higher priority than those in the "Active" list (https://www.ndbc.noaa.gov/activestations.xml). When a station is missing in the "Complete" list, then use the information in the "Active" list. If a station is no longer online, I think it doesn't hurt to keep it in the data/table_files/ndbc_stations.ndbc_stations.xml file.

Here are the steps I suggested:

  1. Check the data/table_files/ndbc_stations.ndbc_stations.xml file against the "Complete" list (https://www.ndbc.noaa.gov/to_station.shtml), and add new stations from the "Complete" list. If position discrepancies are found, update the positions using the information from the "Complete" list.
  2. Check the updated data/table_files/ndbc_stations.ndbc_stations.xml file against the "Active" list (https://www.ndbc.noaa.gov/activestations.xml), and add new stations from the "Active" list. If position discrepancies are found, discard the information in the "Active" list and keep the positions in the updated data/table_files/ndbc_stations.ndbc_stations.xml file.

Please let me know if you need more clarifications. We can connect you with the NDBC science team. Thanks!

davidalbo commented 1 year ago

A bit murky here, I still think. In one case the 'complete' does not have lat/lon information but the 'active' does:

latlonelev disagree for 46518 : Active(44.5,-170.0,-99.9) , Complete(0.0,0.0,-99.9)

In this case I'd assume I'd want to use the active, not the complete.

Several stations have moved from where they used to be. A few examples comparing the current ndbc_stations.xml (default) to active pulled down today:

latlonelev disagree for  46041 : Active(47.352,-124.739,0.0) , Default(47.353,-124.742,0.0)
latlonelev disagree for  46098 : Active(44.378,-124.947,0.0) , Default(44.381,-124.956,0.0)

In this case I'd assume the active is better than what we have in our defaults ndbc_stations.xml

After pulling down the active and default and then reading in the 3 sources default (ndbc_stations.xml), active, and complete some summary output, so you can see it's a small number of stations with conflicts. The only ones of concern to me are the ones that 'vanished' (55) and had conflicting information (22):

PARSED DEFAULT STATIONS FILE NUM= 1376
PARSED ACTIVE STATION FILES: num= 1314
PARSED COMPLETE STATIONS FILES: num= 1769

Done, wrote out  2558  total items to  merged.txt
Number of stations that vanished (are in default ndbc_stations.xml and are not now online):  55
Number of stations that appeared (not in default ndbc_stations.xml and are now online):  1237
Number of stations for which there is a conflict from the various sources: 22
Number of stations for which there is both and active and a complete entry: 525
Number of stations for which there is an active but no complete entry: 789
Number of stations for which there is a complete but no active entry: 1244
davidalbo commented 1 year ago

My only concern is active stations and ndbc_stations that change location, specifically those active stations that are not in the complete stations list. I can keep the ndbc_stations in those cases, or change locations to the active station value. As I said, I'd assume the active_stations are more accurate, but based on the comments it's hard to tell whether to make that assumption or not. Your thoughts?

I would suggest one change for sure:

If there is a conflict, and one source has no lat/lon information (0,0) but the other one does, I'd use the one that does have lat/lon in all cases.

Also, I have the idea going of an option to 'prune', in which stations that are no longer on line can be pruned from the updated stations file. The default would be to keep those stations.

To keep this moving, I'll implement as you suggest, with my change, and keep the design set up so that if we want we can give the active stations priority in the situation I'm describing above, where it is a station that is active, but not complete, and the active values disagree with the ndbc_stations values.

Thanks for helping me get my head around this problem.

davidalbo commented 1 year ago

@JohnHalleyGotway This problem is still a little complex with some unknowns. Can you remind me when it would be good  to have it done?

To move it along, can you also refresh my memory on a couple things:

If I change data/table_files/ndbc_stations.xml, what else do I need to do to get it to go to the correct place(s) and where is that?  I want a newer version of this file to be part of the unit test and to be what is the default that the users see.

I'd want to update the documentation to describe this new python script utility somewhere.  Where?

Thanks in advance.

GwenChen-NOAA commented 1 year ago

Dave,

I agree that when lat/lon information is missing in the "Complete" list, fill it with the info in the "Active" list. This applies to Station 46518 and will be an exception from the general rules.

According to NDBC, the "Active" list is updating in real time. Some buoys can be drifting when their positions were recorded. The "Complete" list contains the moored positions and are more accurate. If we are not updating the data/table_files/ndbc_stations.xml file in real time, the "Complete" list is a better way to go.

For Station 46041, you have Default(47.353,-124.742,0.0). You should check it with the "Complete" list first. If you do that, you will find Complete(47.352, -124.739, 0.0). Then, you will update your Default=Complete(47.352, -124.739, 0.0). After that, you will check your new Default(47.352, -124.739, 0.0) with the "Active" list, and you will find Active(47.352,-124.739,0.0) = new Default(47.352, -124.739, 0.0) = Complete(47.352, -124.739, 0.0) and reach an agreement.

Same applies to Station 46098.

If a station is not in the "Complete" list, but in the "Active" list, use the info in the "Active" list. You can leave the "vanished" stations as is.

davidalbo commented 1 year ago

Ahah, @GwenChen-NOAA, got it. I'll go forward with your suggestions. Thanks again.

davidalbo commented 1 year ago

When I do it the way you suggest (first merge the 'complete' to the 'ndbc_stations' adding new stations and modifying existing ones if needed), when I try to add in 'actives' that are not in that merged list, I get none. In other words all the active stations are either in the original ndbc_stations, or are in the complete list (or both). Interesting! Especially since 'complete' and 'active' both have a large set of unique stations (not found in the other list). Anyway, approach this is looking good.

JohnHalleyGotway commented 1 year ago

After merging this into develop, testing in METplus flagged some buoy location differences. Careful comparison reveals 15 stations whose locations have changed slightly. Several others either have or no longer have a trailing 0 in the decimal place, but I ignored those diffs.

Left is old (main_v11.0) and right is new (develop):

Station ID, Latitude, Longitude

< 41040 15.4235 -56.8072 > 41040 14.54 -53.329 < 41108 33.722 -78.016 > 41108 33.721 -78.016 < 41110 34.142 -77.715 > 41110 34.143 -77.716 < 41114 27.551 -80.217 > 41114 27.552 -80.216 < 42001 25.942 -89.657 > 42001 25.919 -89.674 < 42409 25.901 -89.291 > 42409 22.5065 -94.4146 < 44088 36.611 -74.841 > 44088 36.614 -74.841 < 44097 40.967 -71.126 > 44097 40.967 -71.124 < 46011 34.937 -121 > 46011 34.936 -120.998 < 46041 47.353 -124.742 > 46041 47.352 -124.739 < 46098 44.381 -124.956 > 46098 44.378 -124.947 < 46099 46.986 -124.566 > 46099 46.988 -124.567 < 46100 46.851 -124.972 > 46100 46.851 -124.964 < 46416 49.901 -134.395 > 46416 48.6827 -129.902 < 51210 21.477 -157.756 > 51210 21.477 -157.757