Open Stephen-Gates opened 6 years ago
That sounds like a sensible convention. The original implementation only looked at what was defined in the spec so this was skipped.
Note that READMEs will get currently imported if they are listed as resources in the descriptors just like any other file.
This will be quite a lot trickier than it would appear. We are basically passing the uploaded zipfile to the datapackage
library, that handles all the extracting etc. We get a descriptor object back that does not include the README file (or any other file included in the zip). So anything we implement will need to be on top of that.
I don't want to re-extract the files just to check if there is a README (and we probably can't), so the only thing I can think of is guessing the temp folder where datapackage
extracted the zip file here (using a resource path and the name of zip file), and check the contents to see if there is a README file. If it is, create an additional resource with _create_and_upload_local_resource()
.
We probably want to check that there isn't an actual resource with the README to avoid duplication.
1 day
To anticipate the importing of the README as a CKAN-resource I am looking for a markdown_view. Has any work been done of that?
We use https://github.com/markdown-it/markdown-it in Data Curator to allow the input and preview of markdown.
Links to the CommonMark reference and community implementations at at https://github.com/commonmark/CommonMark
Thanks, I will look into it.
First I would like the imported readme.md to be readable in CKAN. Maybe it is doable to create a fourth text_view for resources with format=md using the helper-function: h.render_markdown?
Is that a good idea?
thanks @amercader Yep not a pretty solution, but I can't find another way of doing this neatly either - I think perhaps adding README as resource is not the way to go as my understanding is (and happy to be corrected here) that a README isn't considered part of the resources for a datapackage as it doesn't hold data - and can't be validated and inferred like other resources - , and working out the temp folder (from existing resources) seems the only way for now. Although datapackage has 'base_path' property, which might make this simpler, but it is deprecated (but maybe it is just default_base_path that is meant to be deprecated?). Asked @roll about this.
Hi @amercader In the end I've disassociated the sense of resources (just in my head) between frictionless datapackages and ckan. Have a PR for this with uploading zip, where README.md is a local resource (as you have advised, and then download ensures it doesn't exist in associated package.json). Just waiting for approval on our side. Also had a question, just in regards to issue of handling downloads:
For the upcoming PR I have considered that as the datapackage.json would not reference README.md, containing that README.md in the download zip accounts for the README.md file. But as datapackage.json was being created as file (as would never/rarely be large) and then downloaded, so too for README.md (the case would usually/always be README.md is only a small file by comparison), so it would be OK to add this to, a simple zip, which would have a set list of filenames - that's the only way I can consider restricting this atm, unless it should also have some explicit size check (apart from python's zip library constraint for size) (atm just README.md) that could be downloaded in this way (ie: without using queue or considering the delay in creating the zip and downloading. The intention is that the 'simple' zip would never contain potentially very large data resources such as CSV files in the zip. That way we can complete this functionality for including a README.md, without tackling the much larger requirement (which I don't think I can find room for), in issue 52 What are your thoughts?
A data package can include a README.md. When I upload a datapackage.zip file I expect the README.md inside to be added as a resource in CKAN.
Currently the README.md is not uploaded so provenance information is lost.