bioatlas / dyntaxa

R package providing access to Dyntaxa - a database of Swedish taxonomic names
GNU Affero General Public License v3.0
1 stars 2 forks source link

Package installation fails - parsing issues? #9

Open peterhellstrom opened 4 years ago

peterhellstrom commented 4 years ago

I have not previously used or installed the dyntaxa package, but when I now try to install the package today following the instructions on the github pages, the installation fails. I'm running Windows 10 and R 4.0.2. Below is the messages I receive from R during installation. It seems to be some sort of parsing issue? Known issue and can it be solved? Regards// Peter Hellström, Swedish Museum of Natural History

library(devtools) Loading required package: usethis install_github("bioatlas/dyntaxa", build_opts = c("--no-resave-data", "--no-manual")) Downloading GitHub repo bioatlas/dyntaxa@HEAD

checking for file 'C:\Users\petehell\AppData\Local\Temp\RtmpcfqMJq\remotesab86c356661\bioatlas-dyntaxa-f486779/DESCRIPTION' ...

√ checking for file 'C:\Users\petehell\AppData\Local\Temp\RtmpcfqMJq\remotesab86c356661\bioatlas-dyntaxa-f486779/DESCRIPTION' (438ms)

√ checking DESCRIPTION meta-information

'\moria\users\petehell' CMD.EXE was started with the above path as the current directory. UNC paths are not supported. Defaulting to Windows directory.

Storing dyntaxa graph relations inC:\Users\petehell\AppData\Local\dyntaxa\dyntaxa/dyntaxa.rds Error: package or namespace load failed for 'dyntaxa': .onAttach failed in attachNamespace() for 'dyntaxa', details: call: graph_from_data_frame(d = e, vertices = v, directed = TRUE) error: Some vertex names in edge list are not listed in vertex data frame Error: loading failed Execution halted *** arch - x64 Warning in dwca_parse_dyntaxa(file) : Found parsing issues in C:\Users\petehell\AppData\Local\dyntaxa\dyntaxa/dyntaxa.zip, details are in result$parsing_issues

Storing dyntaxa graph relations inC:\Users\petehell\AppData\Local\dyntaxa\dyntaxa/dyntaxa.rds Error: package or namespace load failed for 'dyntaxa': .onAttach failed in attachNamespace() for 'dyntaxa', details: call: graph_from_data_frame(d = e, vertices = v, directed = TRUE) error: Some vertex names in edge list are not listed in vertex data frame Error: loading failed Execution halted ERROR: loading failed for 'i386', 'x64'

mskyttner commented 3 years ago

I think the format delivered from https://api.artdatabanken.se/taxonservice/v1/DarwinCore/ has changed since when this package was written. The checklist used to be in a "normalized" format, see https://github.com/gbif/ipt/wiki/BestPracticesChecklists#normalised-classifications-parentchild but is now using the "denormalized" format instead. The taxon core table for example now contains 16 fields instead of 9. The package would need to be updated (and simplified) to support the current format denormalized format, @shahmanash may be able to provide more info. It probably works to use an earlier version of the dataset from https://archive.infrabas.se/dyntaxa/ (replacing the downloaded file at file.path(app_dir("dyntaxa")$config(), "dyntaxa.zip")).

aleruete commented 3 years ago

Same problem here... any news?

shahmanash commented 3 years ago

The package installation fails for a number of reasons. The Dyntaxa package expects a CSV file Identifier.csv which is not included in the current version. The package expects a CSV file Distribution.csv which has been renamed to SpeciesDistribution.csv in the current version. The number of fields in the Taxon core has also changed.

It might require some rewriting of the package to read the meta.xml file included in the DarwinCore Archive and dynamically identify the data files for each of the extension included. Similary, reading the meta.xml file to determine the number of columns for each of the core might be a good addition to avoid issues related to changes in the number of columns included in each core.

Also there seem to be content related issues in the current version of Dyntaxa which leads to failure during compilation of the package.

The current version of Dyntaxa published via Artdatabanken's webservice (https://api.artdatabanken.se/taxonservice/v1/DarwinCore/DarwinCoreArchiveFile?subscription-key=4b068709e7f2427d9fc76bf42d8e2b57) seem to have issues of not maintaining the parent child relationship for all nodes leading to the following error.

call: graph_from_data_frame(d = e, vertices = v, directed = TRUE)
error: Some vertex names in edge list are not listed in vertex data frame

It seems there are nodes without valid parent nodes in the tree as mentioned in the error message which can also be seen here :

https://www.gbif.org/species/search?dataset_key=de8934f4-a136-481c-a87a-b0b202b80a31&origin=SOURCE&issue=PARENT_NAME_USAGE_ID_INVALID&advanced=1

The issues related to the change in structure of the Dyntaxa darwincore archive could probably be addressed by the community however the issues due to the erroneous content of the dataset needs to be addressed by the dataset publisher who have been notified.

shahmanash commented 3 years ago

@aleruete , @peterhellstrom , Please check out https://github.com/mskyttner/dyntaxa .

aleruete commented 3 years ago

It installs fine from Markus repository