LimaRAF / plantR

An R Package for Managing Species Records from Biological Collections
GNU General Public License v3.0
18 stars 4 forks source link

Error in validateCoord() #108

Closed poppy-el closed 4 months ago

poppy-el commented 4 months ago

Hello,

I am using the standardization and validation functions from plantR with the occurrence data from GBIF, and when trying to run the validateCoord() function, I am getting the following error:

beetles9 <- validateCoord(x = beetles8,
       tax.name = "scientificName.new")
Error in solve.default(cov, ...) : 
  system is computationally singular: reciprocal condition number = 2.11204e-17

I've run the same script with multiple subsets of data available on GBIF (separating my region of interest by country), and it looks like the error only occurs when using data from French Guiana.

Data and script producing error:

## Data: From GBIF: Scarabaeidae in Brazil, Bolivia, Colombia, Ecuador, French Guiana, Guyana, Peru, Suriname, Venezuela,
beetles<- readData(file = "0014348-240626123714530.zip",
                   path<- "https://api.gbif.org/v1/occurrence/download/request/")$occurrence

beetles1 <- formatOcc(beetles)

beetles2 <- beetles1[beetles1$specificEpithet != "",]

beetles3 <- beetles2[beetles2$yearIdentified.new != "n.d.",]

beetles4 <- beetles3[beetles3$year.new != "n.d.",]

beetles5 <- filter(beetles4, yearIdentified.new >= year.new)

beetles6 <- beetles5 %>% 
  tidyr::unite("scientificName.new", c(genus,specificEpithet), sep = " ")%>%
  mutate(across(where(is.character), ~ na_if(.,"")))

beetles7<-formatLoc(beetles6)

beetles8 <- formatCoord(beetles7)

beetles_LocVal <- validateLoc(beetles8)

beetles9 <- validateCoord(x = beetles8,
                          tax.name = "scientificName.new")

#Error in solve.default(cov, ...) : 
#  system is computationally singular: reciprocal condition number = 2.11204e-17
LimaRAF commented 4 months ago

Hi @poppy-el,

The problem was being caused by a singularity in the calculation of the function stats::mahalanobis() (used to get spatial outliers) for some species with very low amounts of spatially unique coordinates. I added a patch in the function checkOut() to avoid this error. I adapted you code and use to test if all was good now. And besides some encoding problems, I was able to run the entire plantR workflow.

Please let me know if you managed to run the codes without error, so I can close this issue.

But before running the code below, you will need to install the package from the development branch using the codes below:

install.packages("remotes")
install_github("LimaRAF/plantR", ref = "dev")

And the codes I used:

## Downloading the data
beetles <- readData(file = "0014348-240626123714530.zip",
                   path<- "https://api.gbif.org/v1/occurrence/download/request/")$occurrence

## Trying to solve encoding issues in the data with formatDwc()
### also removing some unecessary columns
beetles.dwc <- formatDwc(gbif_data = beetles, drop = TRUE, drop.opt = FALSE,
                         drop.empty = TRUE, fix.encoding = 'gbif_data')

## Trying to solve encoding issues not solved by formatDwc
unsolved_enc <- c(5516, 206731, 184388) # carefule this indexing changes with time! Always check the entries in the warning message!!
beetles.dwc[unsolved_enc, c("occurrenceRemarks")] <-
  c("Escarabajo Rinoceronte?", "", "")
beetles.dwc[unsolved_enc, c("recordedBy")] <-
  c("Sergio Muriel", "", "Oscar Enciso")
beetles.dwc[unsolved_enc, c("identifiedBy")] <-
  c("Julian Alzate", "", "Neo Scott Anzai")
beetles.dwc[unsolved_enc, c("verbatimLocality")] <-
  c("Medellín, Antioquia, Colombia", "", "Yarumal")

## Removing taxa not identified at least to the species level
beetles.dwc <- beetles.dwc[!beetles.dwc$taxonRank %in%
                             c("FAMILY", "GENUS", "UNRANKED"),]

## Formatting occurrences (names, year, etc)
beetles1 <- formatOcc(beetles.dwc)

## Filtering the data
## Be careful 'cause the steps below remove lots of records (44% of the total)!!
antes <- dim(beetles1)[1]
beetles2 <- beetles1[beetles1$yearIdentified.new != "n.d.",]
beetles3 <- beetles2[beetles2$year.new != "n.d.",]
beetles5 <- dplyr::filter(beetles3, yearIdentified.new >= year.new)
antes - dim(beetles5)[1]

## Not necessary (formatTax does this for you)
# beetles6 <- beetles5 %>%
#   tidyr::unite("scientificName.new", c(genus,specificEpithet), sep = " ")%>%
#   dplyr::mutate(dplyr::across(dplyr::where(is.character), ~ dplyr::na_if(.,"")))

## Formatting localities
beetles7 <- formatLoc(beetles5)

## Formatting coordinates
beetles8 <- formatCoord(beetles7)

## Formatting taxonomic information
# adding an step to cross names with GBIF backbone, although currently
# there is no check for animal family names
insect.gbif <-
  plantRdata::gbifNamesAnimalia[plantRdata::gbifNamesAnimalia$phylum
                                %in% "Arthropoda", ]
beetles8.1 <- formatTax(beetles8, db = insect.gbif,
                        kingdom = "Animalia")

## Validating locality information
beetles_LocVal <- validateLoc(beetles8.1)

## Validating geographical information
beetles9 <- validateCoord(x = beetles_LocVal)

## Validating species identifications
### plantR currently only stores plant taxonomists, so only types are being validate for beetles
beetles10 <- validateTax(beetles9) ## Check for specialists names in the family and send it to raflima@usp.br !

## Validating duplicated specimens across diffent collections
beetles11 <- validateDup(beetles10)

## Summary, flags and the checklist (according to GBIF taxonomy)
summ <- summaryData(beetles11)
flags <- summaryFlags(beetles11)
head(checkList(beetles11, n.vouch = 3, type = "short"), 2)

## Saving and cleaning
saveRDS(beetles11, "beetles_edited_plantR.rds")
rm(list = ls())
poppy-el commented 4 months ago

Hello,

Yes, thank you, this has fixed the problem in my code.

I also tried running it with your code, but have been unable to install plantRdata, with the following errors:

Error: package or namespace load failed for 'plantRdata' in namespaceExport(ns, exports): undefined exports: loadData Error: loading failed Execution halted ERROR: loading failed

Warning message: In i.p(...) : installation of package ‘C:/... /plantRdata_0.0.2.tar.gz’ had non-zero exit status

LimaRAF commented 4 months ago

Hi @poppy-el ,

I removed the loadData() export from plantRdata. It should work out fine now.

Please le t me known if you have any other issues.

Best