LimaRAF / plantR

An R Package for Managing Species Records from Biological Collections
GNU General Public License v3.0
17 stars 4 forks source link

rgbif2() for family #65

Closed herisonmedeiros closed 3 years ago

herisonmedeiros commented 3 years ago

Testendo o tutorial para família funcionou, mas não para Sapindaceae. Acho que acaba sendo um problema porque tem mais de 2,5 milhões de registros no GBIF para Sapindaceae. No exemplo tem filtro para país. Rodamos das duas formas abaixo. Para coletar as informações do specieslink, funcinou. Para o GBIF, não:

familia <- "Sapindaceae" occs_splink <- rspeciesLink(family = familia) #It's working for specieslink occs_gbif <- rgbif2(species = familia, n.records = 2600000) #It's not working for gbif. Maybe the number of records? Error in names(gbif_data) <- species : 'names' attribute [1] must be the same length as the vector [0]

occs_gbif <- rgbif2(species = familia, country = "BR", n.records = 450000) #It's not working, even using the same filter used to do the tutorial Error in names(gbif_data) <- species : 'names' attribute [1] must be the same length as the vector [0]

AndreaSanchezTapia commented 3 years ago

Oi Herison, Tem um limite teórico de ocorrências que o rgbif é capaz de devolver, de 100000, mas o limite não é fixo e ele consegue devolver mais registros. Vou checar de novo a dimensão máxima aproximada do objeto de retorno, mas é o n.records sim. Com 100000 vai funcionar, mas o download de uma tabela maior teria de ser feito em vários passos.

LimaRAF commented 3 years ago

Dear @herisonmedeiros , thanks again.

As Andrea mentioned, you were right. The package rgbif a maximum limit of 100,000 records per query. I made some changes in rgbif2() so that it is clearer to users.

So the options are: download data separately for different Spaindacese genera (best option in my opinion), add some extra filters so that the total number of queries stays within 1 million, or make the download of the entire Sapindaceae family in the GBIF website and load it here using the function plantr::readData().

I used the code below, which worked in my machine:

familia <- "Sapindaceae"
# Downloading the data
occs_splink <- rspeciesLink(family = familia) # ok! ~130000 records
occs_gbif <- rgbif2(species = familia,
                    n.records = 2600000) # I got an error here as well (n.records > 1 million)
# Solution assuming you want only herbarium vouchers
occs_gbif.vouchers <- rgbif2(species = familia,
                    basisOfRecord = "PRESERVED_SPECIMEN",
                    n.records = 600000) # No error but only 95730 records were downloaded
table(occs_gbif.vouchers$genus)

# Solution assuming you want only records for South America
occs_gbif.sam <- rgbif2(species = familia,
                    continent = "South America",
                    n.records = 200000) # No error here but only 94,000 records and I am not sure of the coverage for this field in good GBIF
table(occs_gbif.sam$countryCode)

# Solution assuming you want only records for Brazil
occs_gbif.br <- rgbif2(species = familia,
                    country = "BR",
                    n.records = 200000) # I still got an error here (n.records > 100,000)
table(occs_gbif.br$genus)

# Solution for download data genera by genera (GBIF does not recognizes Paullinieae as a valid taxon key)
genera <- c("Thinouia", "Lophostigma Radlk.", "Cardiospermum", "Paullinia",
            "Serjania Mill.", "Urvillea Kunth")
occs_gbif.paullinieae <- rgbif2(species = genera,
                    n.records = 200000)
table(occs_gbif.paullinieae$genus)

Please let us know if this works for you case so we can close this issue.