frictionlessdata / datapackage-rb

Ruby library and tools for working with datapackages
MIT License
11 stars 5 forks source link

As a user, I want to infer a datapackage json descriptor from a directory of CSVs #54

Closed switzersc closed 4 years ago

switzersc commented 4 years ago

Overview

I needed to infer a datapackage from a set of CSVs for another project (https://github.com/openreferral/hsds_transformer) and had talked to @lwinfree about making a PR to support that functionality in this ruby gem. I took inspiration from the Python library (https://github.com/frictionlessdata/datapackage-py) for the infer method and architecture.

Limitation: Right now this infers all columns as strings rather than trying to interpret other types like integer. I can add some basic type inference but wanted to kick off the conversation about that and the overall architecture and decisions here before I do.

To Do:


Please preserve this line to notify @roll (maintainer of this repository)

roll commented 4 years ago

@switzersc Great! Could you please rebase on master? I have fixed the tests recently

BTW are you interested in joining @frictionlessdata/ruby team and/or becoming a maintainer of the tableschema/datapackage-rb libs? It almost doesn't require any attention and time but we really lack Ruby expertise here.

lwinfree commented 4 years ago

Hi @roll, do you have suggestions of data types Shelby should add functionality for? As Shelby wrote above

Right now this infers all columns as strings rather than trying to interpret other types like integer

What other column types should be supported for infer?

Thanks!

roll commented 4 years ago

@switzersc Please take a look at:

It's an algorithm for inferring types. I think the best implementation in JavaScript (not Python) at the moment.

And then these function results are used in:

switzersc commented 4 years ago

Thanks, @roll! I'll take a look at the JavaScript library, update the PR, and rebase on master. Adding a checklist to this PR description.

I am interested in joining the @frictionlessdata/ruby team! My response times for ongoing open source projects typically vary between 3-15 days but if that's okay, I'm happy to lend some Ruby experience to the group =]

roll commented 4 years ago

@switzersc Awesome! I have sent you an invite to join the team

switzersc commented 4 years ago

@roll I have updated this PR to include type inference including tests and have rebased with master. Looking forward to any feedback!

roll commented 4 years ago

BTW I've sent you an invitation to join the Ruby team - https://github.com/orgs/frictionlessdata/teams/ruby/members