frictionlessdata / datapackage-r

An R package for working with Data Package.
https://frictionlessdata.github.io/datapackage-r/
Other
43 stars 7 forks source link

problem inferring schema when creating data package in R #21

Closed lilyzzhao closed 4 years ago

lilyzzhao commented 4 years ago

Hi, I am having problems creating a data package using my data (octopusmiddlemen.csv) that can be found at [https://github.com/lilyzzhao/east-africa-octopus-trade] within the data folder. The code for inferring the schema that I have been trying in R (see datapackage.Rmd file within the repo) doesn't work.

these lines all run fine:

library(datapackage.r) dataPackage <- Package.load() dataPackage$descriptor['name'] <- 'octopus-trade-middlemen' dataPackage$descriptor['title'] <- 'Octopus Value Chain Data' dataPackage$commit() filepath <- read.csv("data/octopusmiddlemen.csv")

the code line that doesn't work is: schema <- tableschema.r::infer(filepath)

the error message is: -’ not meaningful for factorsthe condition has length > 1 and only the first element will be usedError in if (!headersRow) break : missing value where TRUE/FALSE needed

I am using R version 3.5.1 and RStudio Version 1.0.136

I did try using the gui tool [http://create.frictionlessdata.io/] and the dataset validates. @lwinfree also tried to create a data package with the file using the python code and it worked that way as well.

lwinfree commented 4 years ago

I've also run the above locally and gotten the same 'Error: missing value where TRUE/FALSE needed'. Any ideas @kleanthisk10? Let me know if you need more info from me or from @lilyzzhao. Thanks!!

kleanthisk10 commented 4 years ago

Hello @lilyzzhao @lwinfree i think that your 'octopusmiddlemen.csv' file is not valid.

You can use http://try.goodtables.io/ or http://goodtables.io/ to do so!

lwinfree commented 4 years ago

Hi @kleanthisk10 Thanks for catching the encoding error. I've fixed it & updated the data now: https://raw.githubusercontent.com/lilyzzhao/east-africa-octopus-trade/master/data/octopusmiddlemen.csv The data is now valid via goodtables. However, the error still remains when trying to generate a schema:

> library(datapackage.r)
> dp = Package.load()
> file = read.csv('octopusmiddlemen.csv')
> schema = tableschema.r::infer(file)
Error in if (!headersRow) break : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In Ops.factor(headersRow, 1) : ‘-’ not meaningful for factors
2: In if (!headersRow) break :
  the condition has length > 1 and only the first element will be used

Do you have any ideas about what those errors mean or how to fix them? Thank you!! (CC @lilyzzhao)

kleanthisk10 commented 4 years ago

@lilyzzhao, @lwinfree the problem is where there are empty lines for R. I uploaded an example here: https://raw.githubusercontent.com/kleanthisk10/exampledata/master/octopusmiddlemen.csv Also, you can provide directly the url or the filepath to the infer function.

library(datapackage.r)
dataPackage <- Package.load()
dataPackage$descriptor['name'] <- 'octopus-trade-middlemen'
dataPackage$descriptor['title'] <- 'Octopus Value Chain Data'
dataPackage$commit()
filepath <- 'https://raw.githubusercontent.com/kleanthisk10/exampledata/master/octopusmiddlemen.csv'
schema <- tableschema.r::infer(filepath)
lwinfree commented 4 years ago

Hi @kleanthisk10, what do you mean 'where there are empty lines for R'? I looked at your data example (https://raw.githubusercontent.com/kleanthisk10/exampledata/master/octopusmiddlemen.csv) and Lily's data example (https://raw.githubusercontent.com/lilyzzhao/east-africa-octopus-trade/master/data/octopusmiddlemen.csv ) and they appear to be the same visually, but her data throws errors while yours works fine. Did you edit her data somehow before uploading it to your repo? If so, could you please tell me what you did so I can understand the problem better? Thanks!

lwinfree commented 4 years ago

Hi again @kleanthisk10, OK I opened Lily's original file in a text editor & finally saw the extra lines you mentioned above, so I now understand that. When I deleted those lines, the file successfully creates a schema when I load it via a URL! So yay thank you! But, when I try and load the file locally with read.csv, I am still getting errors. Is read.csv not supported?

example: (this works)

> filepath <- 'https://raw.githubusercontent.com/frictionlessdata/fellows/master/octopusmiddlemenOG.csv'
> schema <- tableschema.r::infer(filepath)

this doesn't work:

> dp <- read.csv('octopusmiddlemenOG.csv')
> schema <- tableschema.r::infer(dp)
Error in if (!headersRow) break : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In Ops.factor(headersRow, 1) : ‘-’ not meaningful for factors
2: In if (!headersRow) break :
  the condition has length > 1 and only the first element will be used

Thanks for your help!

kleanthisk10 commented 4 years ago

Yes this doesn't work and I think that neither in Python and in JS implementation this works.