fishR-Core-Team / FSA

FSA (Fisheries Stock Assessment) package provides R functions to conduct typical introductory fisheries analyses.
https://fishr-core-team.github.io/FSA/
GNU General Public License v2.0
66 stars 22 forks source link

Bad handling of `NA`s in `psdAdd()` #64

Closed droglenc closed 3 years ago

droglenc commented 3 years ago

In short, I have two issues related to situations where there is a species=NA in the data frame. First, the species=NA generates an NA for the PSD name (as would be expected), but this occurs at the end of the list of items returned, so this makes it hard to pair things up with cbind(). Second, psdAdd() is returning more items than I expect (i.e., more items than there were rows in the dataframe). The extra items returned only appears to occur if I have an NA for species and more than one species for the values that are not NA. This is making it difficult for me to add psd size classes to a dataframe using mutate() or creating a list that I then cbind() to the original as the number of elements does not match. I originally just deleted rows with species=NA, but cannot do that in this new case (CPUE by PSD class rather than calculating PSD values) as I need to track samples where no species were caught (so no legit species name is available) in order to use complete() to add zero catch data in for any PSD size class that was not caught.

Here are some trivial examples illustrating the issue:

library(FSA)

# first 4 examples work as I’m expecting (either no spp=NA or all the same
# species for the non-NA species)

## has 5 items just like original data, so NA in length not a problem
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie","White Crappie","White Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 5 items just like original data, so single NA in length not a problem
## even with mix of spp names
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie","Black Crappie","Black Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

# has 5 items just like original data, so multiple NA in length not a problem
testdf <- data.frame(TL=c(400,NA,250,NA,50),
                     Spp=c("White Crappie","White Crappie","White Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 5 items just like original data, but order of NA's not as expected for
## missing spp (has moved to end and will cause erroneous results if I try to
## cbind or mutate)
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"White Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

# below examples have extra elements…all have one record with spp=NA and the
# number of extra elements returned seems related to the number of different
# spp in the dataframe or the number of times 2 different species occur.

## has 1 extra NA
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"White Crappie",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 1 extra NA, so does not appear related to NA's in length, just species
testdf <- data.frame(TL=c(400,90,250,130,50),
                     Spp=c("White Crappie",NA,"White Crappie",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 2 extra NA's
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"Largemouth Bass",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## still 2 extra NA's, so species names with no PSD categories behave same as
## species names with PSD categories
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"badSpp",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 3 extra NA's
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"Largemouth Bass",
                           "Bluegill","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 5 rows as expected...so extra NA's only happen if there is at
## least one spp=NA
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie","White Crappie","Largemouth Bass",
                           "Bluegill","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 2 extra NA even though only 1 species other than White Crappie...
## so not purely a function of # of spp used
testdf <- data.frame(TL=c(400,90,250,NA,50,100),
                     Spp=c("White Crappie",NA,"White Crappie","White Crappie",
                           "Black Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)
droglenc commented 3 years ago

I think the issue is related to this line ...

tmpdf <- data[data[,2]==specs[i],]

as the species with NA get carried along with this such that tmpdf does not contain just specs[i] (it also contains species==NA). I think that I can fix this by pulling off the NA species first and then adding them back in after the other species have been worked through.

droglenc commented 3 years ago

Will be fixed in v0.8.32