frictionlessdata / datapackage-go

A Go library for working with Data Package.
MIT License
21 stars 4 forks source link

Introduce Resource.RawRead() #13

Closed fils closed 6 years ago

fils commented 6 years ago

I'm playing around with an idea...

Placing non-tabular data in a data package (a JSON-LD file) and using that to build out landing pages from (something I am doing elsewhere already).

I'm playing with this idea at: https://github.com/fils/THROUGHPUTDataPackages

and some go code at https://github.com/fils/THROUGHPUTDataPackages/blob/master/dataPkgLD/osproxy/osproxy.go

does the classic read tabular data. When I try to read the "schemaorg" entry though in https://github.com/fils/THROUGHPUTDataPackages/blob/master/CarpLake/datapackage.json

I get the expected error that I can not perform a ReadAll() on non-tabular data.

Question. Is there a way to get a handle to a non-tabular file and read its contents?
Since I was planning on using ZIP files, and you load zip files to a tmp file system, I wonder if I could get a handle via that somehow?

Any thought or advice (or if this is a up hill battle that should be avoided) I would appreciate..

Thanks Doug

danielfireman commented 6 years ago

Hi Doug, good morning!

Question. Is there a way to get a handle to a non-tabular file and read its contents?

The current answer is no. The first version was solely focused on tabular data. Maybe we could change this issue to become a feature request to introduce resource.raw_read and resource.raw_iter(). What do you think?

Would that fit your use case?

Since I was planning on using ZIP files, and you load zip files to a tmp file system, I wonder if I could get a handle via that somehow?

It is totally fine load zip data packages. The library currently detects the zip, unzips it to a tmp directory and loads resources from there. The load_zip example shows that feature.

fils commented 6 years ago

@danielfireman Good morning!

I'm totally fine this being a feature request as long as it doesn't go against the principles of Frictionless Data Packages. The two functions seem good starts indeed.

My use cases are like the following: 1) The continental drillers take high res photos, XRF and other image or binary style data. Along with this is tabular data from GeoTek and other instruments. Having this data in the package so I could both access the tabular data as well as the image or binary data would all the FDP to server as the package for some of our core view tools as well as allow it to be presented in web UIs

2) The one I am playing with now is to have a JSON-LD document in the package that follows the patterns we are using at P418 [1] to describe data sets. This is being looked at by AGU as a recommended method to support landing pages for data sets. So being able to open and read out the JSON-LD is important to making a simple implementation of this pattern. So a simple resource.raw_read would be all I need.

I can actually mock my needs OK I think by leveraging my Minio S3 system. Since I can pass the location of the package.json to the library as a URL, I think it will obtain the relative path to the other resources as URLs too I suspect. At that point the pointer *Resource might let me pull from the map the URL I need to load the file via traditional go URL calls. Hackish (maybe too convoluted) but I should try it. Would like to stay focused on the principle of the frictionless data package as the "nugget" of data we work with and pass around. However, the more logical fallback is to just let the FDP zip sit in minio and pull and extract the file I am after into my own tmp FS and just do traditional approaches to getting what I need.

Thanks! appreciate the quick response.. it helps me resolve a path forward. I'll watch for a raw_read in the possible future.

Doug

[1] https://github.com/earthcubearchitecture-project418/p418Vocabulary

danielfireman commented 6 years ago

Sounds good. Thanks for the detailed explanation!

Updating this issue to be a feature request.

danielfireman commented 6 years ago

Hi Doug,

Already pushed Resource.RawRead to master. Also added a section about non-tabular data in the Readme.

Could you please try it out and let us know how it goes?

Thanks!

fils commented 6 years ago

@danielfireman traveling this week for meetings.. but I'll actually mention this in the first of these in the morning. I made a laughably simple example at [1] and in that the handler landingpage package calls the osproxy function (bad names. I'll clean those up).. The handler loads the JSON-LD into the page template perfectly. From there I will read the JSON-LD via web components to build out maps, citation, etc from the schema.org entries.

but it works perfectly!!! I need to test pulling the zip file from Minio/S3 but I have no doubt that will work just fine as it's part of your core library and tested.

So thank you SO much!!! this is really interesting to me and I plan to roll this demo out and get some opinions from AGU and EarthCube reps (and some facility people in the morning).

Question for you.

Thanks again for this! I sketched out my plan B on the flight here... happy to not have to bother with that!

[1] https://github.com/fils/THROUGHPUTDataPackages/tree/master/dataPkgLD [2] https://json-ld.org/test-suite/

danielfireman commented 6 years ago

Hi Doug,

Cool that everything worked as expected. I am going to make a small change, just to be more idiomatic. Instead of having two methods RawRead and RawIter (like python), the go library will have just RawRead and it will return an io.ReaderCloser. You could simply use ioutil.ReadAll() to get the []byte from it. Hope to have this in by Thursday and will release a new minor version. Will close this bug when the release is out.

Is such a raw read function present in any other frictionless data package libraries?

Yes. As an example, you can find the python implementation here (raw_read) and here (raw_iter).

Is there a set "core" set of API functions that have to be present in datapackage implementations? Like a set of testable tasks for a library, something like what the JSON-LD test suite [2] does for that group?

The core set of data package implementations are divided into basic and extended. Go implementeation is new, so it has the basic covered. What is expected for new implementations is outlined here, detailed here and tracked here. There is a global test suite here for basic and here for extended features. As you could see, Golang is not part of the global test suite yet.

Is there a governance process that defines out the such API requirements for an implementation?

I used the spec and the implementation readme as the main source of information about process and API requirements. Also used the python and js implementations as reference implementations.

cc/ @roll, who could add more info to the thread.

danielfireman commented 6 years ago

Version 0.2 released with Resource.RawRead support. Enjoy!