Closed lime-n closed 4 years ago
In this case the error is exactly what the error message says, all the variables you have listed for site_covs
need to be constant across sites. That's not happening in your case, e.g.
select(occ, site, pland_00_water)
# site pland_00_water
# <chr> <dbl>
# 1 L10018668_obs439702_2020 NA
# 2 L10018668_obs439702_2020 NA
# 3 L10018668_obs439702_2020 NA
# 4 L10018668_obs439702_2020 NA
# 5 L10024459_obs1462591_2020 0.0323
# 6 L10024459_obs1462591_2020 NA
Any suggestions on how I can overcome this?
Since these are from exactly the same location in the same year they should have the same covariates. The fact that they don't suggests something must have gone wrong during covariate assignment, but I have no idea what could have happened here.
It could be worth checking if it's always happening with NA values or if you have some cases where two sites have different non-NA values.
I believe it may be when I read the csv into r because I get this error
Warning: 317279 parsing failures.
row col expected actual file
2523 pland_08_woody_savanna 1/0/T/F/TRUE/FALSE 0.1935483870967742 'data/pland-elev_location-year.csv'
2524 pland_08_woody_savanna 1/0/T/F/TRUE/FALSE 0.26666666666666666 'data/pland-elev_location-year.csv'
44813 pland_01_evergreen_needleleaf 1/0/T/F/TRUE/FALSE 0.4666666666666667 'data/pland-elev_location-year.csv'
44814 pland_01_evergreen_needleleaf 1/0/T/F/TRUE/FALSE 0.09375 'data/pland-elev_location-year.csv'
44815 pland_01_evergreen_needleleaf 1/0/T/F/TRUE/FALSE 0.53125 'data/pland-elev_location-year.csv'
..... ............................. .................. ................... ...................................
See problems(...) for more details.
Some of the covariates have a mixture of values, some only mention TRUE, whilst others have integer values and some only NAs.
I wasn't experience any errors during the process of doing the code so I find this confusing.
It may have been because I changed this code:
pland <- pland %>%
pivot_wider(names_from = lc_name,
values_from = pland,
values_fill = list(pland = 0))
as it was not working, it returned this error:
Error: Can't convert to . Run rlang::last_error() to see where the error occurred. In addition: Warning message: Values are not uniquely identified; output will contain list-cols.
Use values_fn = list to suppress this warning.
Use values_fn = length to identify where the duplicates arise
Use values_fn = {summary_fun} to summarise duplicates
to this:
pland <- pland %>%
group_by(lc_name) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = lc_name, values_from = pland) %>%
select(-row)
right before writing it into .csv form.
could be that I missed including this code values_fill = list(pland = 0))
into the other.
After a long session of uploading all the code, whilst the issue I mentioned has now fixed any warnings from occuring during parsing, I still get back the same error about the same covariates. Thankfully, there are no more NA values and these have been filled with 0.
I believe the problem lies within the pland code and that there are duplicate entires. I have confirmed this with values_fn = list(pland = length
, with some columns returning 2 as opposed to 1 or 0.
Is there a way to summarise the duplicates so they only return 1?
When I look for the frequency of duplicates, it shows this:
> head(data.frame(table(occ$site)))
Var1 Freq
1 L10000468_obs1252332_2019 4
2 L10000750_obs132896_2019 10
3 L10001060_obs1162224_2019 8
4 L10001830_obs476367_2019 3
5 L10002157_obs163161_2019 10
6 L10002592_obs500379_2019 10
Would deleting the frequencies in which they occur more than one work, or would I be losing valauble data?
I'm not sure what's going on here, there's clearly some issue with the data processing, but it's hard to say what it is without a concise reproducible example. I am fairly certain there isn't any issue with the format_unmarked_occu()
function though, and at the moment I don't have time to look into this further. I will come back to this if I do manage to have some spare time.
I have uploaded all the files necessary to reproduce the error on my Github here: https://github.com/lime-n/ebird_data
It may be best to read occ.csv only.
I am working with 9 different species, and this error occurs for 7 out of the 9. Whilst following the code for covariates 'as is' without error.
I have tried looking on stackexchange and cannot seem to figure out the problem, maybe you will have better luck.
I have found it to work with:
library( data.table )
setDT(mydata)[ !duplicated( site, fromlast = TRUE ), ]
However it removes up to 10,000 rows of data. Which may skew my analysis when my other two data work as is ad hoc.
An extra problem with this approach is that it produces negative values for the occ_model, so occ_gof cannot run.
Found that the problem is with the longitudinal coordinates. Not sure how it got there as it works perfectly well for some other species whilst following the exact same code. Must be an ebird error?
It seems that writing the dataframe into a csv, and then loading it back into the R environment helps solve the issue. How strange?
That is strange! I don't foresee myself having time to look into this further in the near future, but seems like you may have figured out a solution.
When writing the .csv, I have noticed that it introduces another column which denotes for values each row number. Seems it was the magic column to fixing the problem.
Hmm, that seems very odd to me, this shouldn't fix the issue. Are you sure it's not preventing the error but causing an incorrect result? In general, I suggest always using write_csv() from the readr package, which doesn't add this extra column, you almost never want to save the row number.
I am trying to prepare a dataframe of species observation for abundance and occurrence data using habitat covariates.
When using this code:
I get this error:
What does this mean and how can I overcome this error?
a reproducible code: