SimonGreenhill / rcldf

rcldf - The R library for reading CLDF files
Other
16 stars 4 forks source link

APiCS CLDF parsing? #46

Open HedvigS opened 4 months ago

HedvigS commented 4 months ago

I was fetching APiCS, and I got a warning about parsing, probably from readr?

> APiCS_rcldf_obj <- rcldf::cldf("https://zenodo.org/records/7139937/files/apics-v2013.zip", load_bib = F)
Unzipping to: /Users/skirgard/Library/Caches/org.R-project.R/R/rcldf/2e95fbd231a6ab610ab61ef1b4cefc62
Warning message:                                                                                                                                                       
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat) 

Not sure what's going on, the tables look good to me so far. Usually I get this warning when readr is expecting x items in a row but getting y.

HedvigS commented 4 months ago

getting this as well now:

> Glottolog_rcldf_obj <- rcldf::cldf("https://zenodo.org/records/10804582/files/glottolog/glottolog-cldf-v5.0.zip", load_bib = F)
Reusing cache in: /Users/skirgard/Library/Caches/org.R-project.R/R/rcldf/dd342b297ebe4e132a7496e6aa82915b
ERROR [2024-05-03 14:52:50] cldf: file does not exist: values.csv
ERROR [2024-05-03 14:52:50] cldf: file does not exist: trees.csv                                                                                                       
Error in cldfobj$tables[[cldfobj$resources[[url]]]] :                                                                                                                  
  attempt to select less than one element in get1index

So, something larger seems to be going awry. Restarted, not helping. Gonna try more things.

HedvigS commented 4 months ago

Haven't been able to solve this problem yet. I'm using my homemade "get_Zenodo" function in the meantime.

This works:


source("../functions/get_zenodo.R")

get_zenodo_dir(url = "https://zenodo.org/records/10804582/files/glottolog/glottolog-cldf-v5.0.zip", exdir = "output/glottolog-cldf_v5/")

Glottolog_rcldf_obj <- rcldf::cldf("output/glottolog-cldf_v5/cldf/cldf-metadata.json", load_bib = F)

get_zenodo_dir(url = "https://zenodo.org/records/7139937/files/apics-v2013.zip", exdir = "output/APiCS/")

APiCS_rcldf_obj <- rcldf::cldf("output/APiCS/cldf/StructureDataset-metadata.json", load_bib = F)
SimonGreenhill commented 4 days ago

this is due to readr identifying the final column in media.csv as a double (col_double) not an integer (col_integer) despite what the cldf metadata and readr spec says. I think it's an interaction with skip (if skip is set to zero then the error does not occur).

https://github.com/SimonGreenhill/rcldf/blob/503600e63125ba7494f5ff02a46e7b3f4ff0cd82/R/csvwr_overrides.R#L94-L96