USEPA / Phytoplankton-Data-Analysis

Phytoplankton Data Analysis
3 stars 0 forks source link

Unique identifiers: combined_wq_20140509.txt #36

Closed jbeaulie closed 9 years ago

jbeaulie commented 10 years ago

I assume that all unique identifiers must be consistently formatted before merging with algae data. Below are some specific formatting issues.

-table(chem$sample_date) reveals quite a few anomalous dates.
-table(chem$sample_time) reveals a lot of formatting diversity. Probably want consistent four character code. -table(chem$location) reveals a lot of formatting diversity. This field has some overlap with station and lake, may ultimately omit? -table(chem$station): Many non-standard values; probably related to issue #22 and may not be a problem. -table(nchar(chem$ID)): ID ranges from 21 to 25 characters. Some of this variation may be due to issues with sample_date, sample_time, and station? -chem$lake; 19127 missing values, but in all cases the 'location' field is populated? Lake info is typically contained in 'location;. -chem$depth looks good.

jbeaulie commented 10 years ago

This text file contains observations not included in the STORET data; specifically pre-2000 observations. I think we should address the issues above, at least with respect to the pre-2000 observations, and prepare the file to be merged with the STORET data file.

It seems the field names should be aligned with the nomenclature used in the algae file (i.e., Lake, Station, Depth.ft, Date).

jbeaulie commented 9 years ago

I handled all issues described above. Turns out all pre 2000 observations were from inflows and outflows. Those data are not useful for our analysis.