Closed jbeaulie closed 9 years ago
This text file contains observations not included in the STORET data; specifically pre-2000 observations. I think we should address the issues above, at least with respect to the pre-2000 observations, and prepare the file to be merged with the STORET data file.
It seems the field names should be aligned with the nomenclature used in the algae file (i.e., Lake, Station, Depth.ft, Date).
I handled all issues described above. Turns out all pre 2000 observations were from inflows and outflows. Those data are not useful for our analysis.
I assume that all unique identifiers must be consistently formatted before merging with algae data. Below are some specific formatting issues.
-table(chem$sample_date) reveals quite a few anomalous dates.
-table(chem$sample_time) reveals a lot of formatting diversity. Probably want consistent four character code. -table(chem$location) reveals a lot of formatting diversity. This field has some overlap with station and lake, may ultimately omit? -table(chem$station): Many non-standard values; probably related to issue #22 and may not be a problem. -table(nchar(chem$ID)): ID ranges from 21 to 25 characters. Some of this variation may be due to issues with sample_date, sample_time, and station? -chem$lake; 19127 missing values, but in all cases the 'location' field is populated? Lake info is typically contained in 'location;. -chem$depth looks good.