readCyano.R should specify hab = TRUE

USEPA / Phytoplankton-Data-Analysis

Phytoplankton Data Analysis

3 stars 0 forks source link

readCyano.R should specify hab = TRUE #26

Open jbeaulie opened 10 years ago

jbeaulie commented 10 years ago

All files read by readCyano.R report data from the HAB monitoring program and should be flagged as hab=TRUE in the dataframe.

mjpdenver commented 10 years ago

readCyno.R has been modified. Originally, only files with HAB in the file name where labeled HAB. This change means that file such as 92754.xls are implicitly assumed to be HAB.

Date: Tue, 22 Apr 2014 13:06:30 -0700 From: notifications@github.com To: Phytoplankton-Data-Analysis@noreply.github.com Subject: [Phytoplankton-Data-Analysis] readCyano.R should specify hab = TRUE (#26)

All files read by readCyano.R report data from the HAB monitoring program and should be flagged as hab=TRUE in the dataframe.

— Reply to this email directly or view it on GitHub.

jbeaulie commented 10 years ago

I confirmed that the files read by readCyano.R that do not contain 'HAB' in the file name are from the HAB monitoring program and should therefore be flagged hab=TRUE.

jbeaulie commented 9 years ago

cleaned_algae_20150619.xlsx contain many observations coded as hab=TRUE that did not come from HAB monitoring program. We need to resolve this.

willbarnett commented 9 years ago

I haven't changed the hab column yet. Most of the scripts assigning this column as TRUE have a comment that says '#hard coded - see issue'. So what logic do we want to use? Should we grep for the word 'HAB' in the filename? Are there other variants that we're looking for?

jbeaulie commented 9 years ago

It appears that data from the HAB sampling campaign can be identified one of two ways. This first is that the file name contains "HAB", or some variation thereof (i.e. "hab"). The other is that the worksheet containing the data has the header "Cyanobacterial Analysis Report". An example of the latter is Drew data/f/92754.xls as Matt mentioned above. I would guess that all hab=TRUE cases come from the readHAB.R and readCyano.R scripts, but it isn't immediately clear to me how these scripts differ. There are cases where data that were read from files that do not contain "HAB" in the filename and don't contain the header "Cyanobacterial Analysis Report" are coded as hab=TRUE. Sheet 1055 is an example.