frictionlessdata / datapackage-r

An R package for working with Data Package.
https://frictionlessdata.github.io/datapackage-r/
Other
43 stars 7 forks source link

Spurious row mismatch error when trying to read table #28

Open Ergative opened 2 years ago

Ergative commented 2 years ago

Overview

I have loaded a tabular datapackage containing multiple CSVs and would like to get the data tables from the resources using $table$read(). However, I am unable to retrieve the tables themselves, instead getting the following message:

Error: Row dimension doesn't match schema's fields dimension

I do not believe I should be getting this error message, because:

  1. I've carefully verified every file by hand.
  2. I created the package myself using the Python version of the framework.
  3. I can use the frictionless Python library to load the whole datapackage with no problem:
    # Snippet of Python script which CAN load and validate the package, and process the data.
    package = Package("../datapackage.json")
    report = package.validate()
    len(report.errors) == 0 and report.stats['errors'] == 0 # True
# Snippet of R script which CANNONT load data in the package, though it seems to say it is valid.
# Loading the package itself: everything looks perfect.
package <- datapackage.r::Package.load("../datapackage.json")
package$valid          # TRUE
length(package$errors) # 0
str(package$resources) # List of 16... all schema info looks correct for all 16 CSVs.

I followed the examples shown on the Frictionless Data website, including this example of reading the tables. That's where I get the error saying the number of rows is incorrect.

# Line that causes error
package$resources[[1]]$table$read() # Error: Row dimension doesn't match schema's fields dimension

I have tried stepping through this in the debugger in RStudio and am pretty confused. Inside getTable_(), the local variable schema seems to have the right stuff in it (for instance, it lists 12 fields, which is the correct number of fields, and definitely the number of fields in the file). Going further in the debugger is not very revealing; sometimes I get "Error: invalid connection", sometimes other things, depending on what I choose to step into versus over (maybe there's a timeout somewhere?), whereas I invariably get "Error: Row dimension doesn't match schema's fields dimension" when I run the script without the debugger.

Miscellaneous


Please preserve this line to notify @kleanthisk10 (lead of this repository)