USGS-R / drb-estuary-salinity-ml

Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

Site 01467200 has anomalous column headers #39

Closed galengorski closed 2 years ago

galengorski commented 2 years ago

It looks like site 01467200, Delaware River at Penn's Landing has multiple observations for several of the parameters that we're interested in (temperature, specific conductivity, pH etc). It looks like this is because they are taking some new measurements at this site in collaboration with the Independence Seaport Museum, hence why some of the column headers have ISM in the name. Unfortunately these have the same parameter values as the variables that we actually want, so we'll have to screen them out maybe by column name. I went into the munge/out csv file and just deleted the columns, but I think this could be done in the munge step.

amsnyder commented 2 years ago

Yes, I was actually just noticing this. I was going to make a pull request to fix it!

lekoenig commented 2 years ago

We've encountered 01467200 in inland-salinity, too. Galen, I never knew what "ISM" referred to in that site's column headers 😃. I'm curious to hear about how you all use these data; if it's helpful, here's how we've dealt with the odd column headers for that site (lines 32-36). A bit of discussion is here as well.

A new exhibit at the museum is also being developed to allow the public to conduct water chemistry experiments using similar equipment to that deployed in the river.

Looks cool!

amsnyder commented 2 years ago

@galengorski - all of these ISM data points were already being dropped because all data is marked as provisional. The column headers were still around (even though they didn't contain data), but I updated my current open pull request to drop any columns with no data (including these). You can check out the small change to the munge file in there. Do you have any concerns with just having them removed in this way?

galengorski commented 2 years ago

Ok cool, I think that will work well and might potentially catch other columns that we don't care about also.