biodiversitydata-se / SBDI4R

R package to search and access data made available through the Swedish biodiversity data infrastructure SBDI
https://biodiversitydata-se.github.io/SBDI4R/
GNU Affero General Public License v3.0
1 stars 2 forks source link

SitesBySpecies broken #13

Open aleruete opened 3 years ago

aleruete commented 3 years ago

the api for https://spatial.bioatlas.se/ws/sitesbyspecies/ seems to be returning wrong answer where columns are both the species names and something else e.g. accipiterNisusLinnaeus; x1758; accipiterGentilisLinnaeus; x17581

shahmanash commented 3 years ago

@aleruete Can you paste an example POST request here.

aleruete commented 3 years ago

Caching https://spatial.bioatlas.se/ws/tasks/create?userId=0&sessionId=2001 POST to file ~/sbdi_cache/f1b41527bb85860cd7e0eb84cc2a9fae -> POST /ws/tasks/create?userId=0&sessionId=2001 HTTP/1.1 -> Host: spatial.bioatlas.se -> User-Agent: SBDI4R 0.0.1 -> Accept-Encoding: deflate, gzip -> Accept: application/json, text/xml, application/xml, / -> Content-Type: application/x-www-form-urlencoded -> Content-Length: 394 ->

name=PointsToGrid&input={"area":[{"name":"Current extent","wkt":"POLYGON((11.94 55.13, 11.97 58.60, 17.90 58.58, 17.87 55.11, 11.94 55.13))"}],"occurrenceDensity":false,"sitesBySpecies":true,"speciesRichness":false,"species":{"q":["genus:Accipiter"],"bs":"https://records.bioatlas.se/ws/","name":"genus:Accipiter"},"gridCellSize":0.1,"resolution":0.01,"movingAverage":"1x1 (no moving average)"}

<- HTTP/1.1 302 Found <- Server: nginx/1.17.6 <- Date: Wed, 07 Oct 2020 10:18:21 GMT <- Content-Type: application/json;charset=UTF-8 <- Transfer-Encoding: chunked <- Connection: keep-alive <- X-Application-Context: application:production <- Location: https://auth.bioatlas.se/cas/login?service=https://spatial.bioatlas.se/ws/tasks/index <- Strict-Transport-Security: max-age=31536000 <- Referrer-Policy: strict-origin-when-cross-origin <- X-Frame-Options: SAMEORIGIN <- X-Content-Type-Options: nosniff <- Content-Security-Policy: upgrade-insecure-requests <- Warning message: In check_status_code(status_code, extra_info = diag_message) : HTTP status code 302 received. This may be OK: if there are problems, please notify the package maintainers.

shahmanash commented 3 years ago

To test the function sites_by_species which calls the service, I configured the ALA4R library with the following config in my Rprofile file

server_config <- list(
  max_occurrence_records = 500000,
  server_max_url_length = 8150,
  brand = "ALA4R",
  notify = "Please use https://github.com/AtlasOfLivingAustralia/ALA4R/issues/ or email to support@ala.org.au",
  support_email = "support@ala.org.au",
  reasons_function = "ala_reasons",
  fields_function = "ala_fields",
  occurrences_function = "occurrences",
  config_function = "ala_config",
  base_url_spatial = "https://spatial.bioatlas.se/ws/",
  base_url_bie = "https://species.bioatlas.se/ws/",
  base_url_biocache = "https://records.bioatlas.se/ws/",
  base_url_biocache_download = "https://records.bioatlas.se/ws/biocache-download/",
  base_url_alaspatial = "https://spatial.bioatlas.se/alaspatial/ws/",
  base_url_images = "https://images.bioatlas.se/",
  base_url_logger = "https://logger.bioatlas.se/service/logger/",
  # base_url_fieldguide = "https://fieldguide.bioatlas.se/",
  base_url_lists = "https://lists.bioatlas.se/ws/",
  biocache_version = "2.2.3",
  verbose = TRUE,
  download_reason_id = 10,
  caching="off"
)
if (!"ALA4R_server_config" %in% names(options())) {
  message("\nNo existing ALA4R server config, using Swedish data sources...\n")
  options(ALA4R_server_config = server_config)
} else {
  message("Overwriting existing ALA server config with new...")
  options(ALA4R_server_config = server_config)
}

message("\n*** Successfully loaded .Rprofile ***\n")

Then I executed the followin 2 lines of code

library(ALA4R)
ss <- sites_by_species(taxon="genus:Accipiter", wkt="POLYGON((11.94 55.13, 11.97 58.60, 17.90 58.58, 17.87 55.11, 11.94 55.13))", gridsize=0.1, verbose=TRUE)

I get the following error in the console

Caching https://spatial.bioatlas.se/ws/tasks/output/3665297/download.zip to file /tmp/Rtmp4VldSG/3032e9cefc3444cb15369e2f86f6d072
-> GET /ws/tasks/output/3665297/download.zip HTTP/2
-> Host: spatial.bioatlas.se
-> user-agent: ALA4R 1.8.0
-> accept-encoding: deflate, gzip, br
-> accept: application/json, text/xml, application/xml, */*
-> 
<- HTTP/2 200 
<- server: nginx/1.17.6
<- date: Wed, 07 Oct 2020 16:41:14 GMT
<- content-type: application/zip
<- content-length: 13811
<- x-application-context: application:production
<- content-disposition: attachment
<- strict-transport-security: max-age=31536000
<- referrer-policy: strict-origin-when-cross-origin
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- content-security-policy: upgrade-insecure-requests
<- 
Error in names(guids) <- names(out)[-2:-1] : 
  'names' attribute [10] must be the same length as the vector [6]
In addition: Warning message:
In check_status_code(status_code, extra_info = diag_message) :
  HTTP status code 302 received.
This may be OK: if there are problems, please notify the package maintainers.

It confirms the API is working and the download does happen although the function does break for some other reason. I browsed to the folder /tmp/Rtmp4VldSG and list the content of the file 3032e9cefc3444cb15369e2f86f6d072 with unzip -vl 3032e9cefc3444cb15369e2f86f6d072 it lists two files sxs_metadata.html and SitesBySpecies.csv . I have uploaded the zip file renamed to Accipiter.zip Accipiter.zip

The same can be check from the following URL

https://spatial.bioatlas.se/ws/tasks/output/3665297/SitesBySpecies.csv

Can you please try to replicate this with library SBDI4R . Please compare the URL for services in SBDI4R with the content of Rprofile file above.

shahmanash commented 3 years ago

In order to test the API using SBDI4R, I executed the following code with parameter as in the earlier comment

library(devtools)
install_github("biodiversitydata-se/SBDI4R")
library(SBDI4R)
ss<-sites_by_species(taxon="genus:Accipiter", wkt="POLYGON((11.94 55.13, 11.97 58.60, 17.90 58.58, 17.87 55.11, 11.94 55.13))", gridsize=0.1, verbose=TRUE)

I get similar output

Caching https://spatial.bioatlas.se/ws/tasks/output/3665309/download.zip to file /tmp/RtmpUTCZOK/3032e9cefc3444cb15369e2f86f6d072
-> GET /ws/tasks/output/3665309/download.zip HTTP/2
-> Host: spatial.bioatlas.se
-> user-agent: SBDI4R 0.0.1
-> accept-encoding: deflate, gzip, br
-> accept: application/json, text/xml, application/xml, */*
-> 
<- HTTP/2 200 
<- server: nginx/1.17.6
<- date: Wed, 07 Oct 2020 18:29:54 GMT
<- content-type: application/zip
<- content-length: 13811
<- x-application-context: application:production
<- content-disposition: attachment
<- strict-transport-security: max-age=31536000
<- referrer-policy: strict-origin-when-cross-origin
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- content-security-policy: upgrade-insecure-requests
<- 
Error in names(guids) <- names(out)[-2:-1] : 
  'names' attribute [10] must be the same length as the vector [6]
In addition: Warning message:
In check_status_code(status_code, extra_info = diag_message) :
  HTTP status code 302 received.
This may be OK: if there are problems, please notify the package maintainers.

The API seems to be running and the download in the tmp folder occurs. To test the validity of the API I check the following URL

https://spatial.bioatlas.se/ws/tasks/output/3665309/SitesBySpecies.csv which returns a CSV file.

aleruete commented 3 years ago

with the last query there seems to be a problem in the format of the file as to how the columns get separetad this is a snippet of how I see it:

"LSID","Longitude","Latitude","2480637","2480589","7191196","7191198","9405810","6066148" "Common Name",Longitude,Latitude,sparrowhawk, eurasian sparrowhawk,goshawk, northern goshawk,,,Bird Hawks, "Kingdom",Longitude,Latitude,Animalia,Animalia,Animalia,Animalia,Animalia,Animalia "Family",Longitude,Latitude,Accipitridae,Accipitridae,Accipitridae,Accipitridae,Accipitridae,Accipitridae "Species",Longitude,Latitude,Accipiter nisus (Linnaeus, 1758),Accipiter gentilis (Linnaeus, 1758),Accipiter gentilis gentilis,Accipiter nisus nisus,Accipiter Brisson, 1760,Accipiter gentilis buteoides (Menzbier, 1882) "13.553378000000002_55.18333",13.553378000000002,55.18333,1,0,0,0,0,0 "12.753378000000001_55.28333",12.753378000000001,55.28333,4683,353,0,0,0,0 "12.953378_55.28333",12.953378,55.28333,42,2,0,0,0,0 ...

aleruete commented 3 years ago

seems like "," is the column separator but is also in some column names e.g. Accipiter nisus (Linnaeus, 1758)