KSP-CKAN / CKAN

The Comprehensive Kerbal Archive Network
https://forum.kerbalspaceprogram.com/index.php?/topic/197082-*
Other
1.98k stars 347 forks source link

introspect module contents for valuable metadata #377

Closed btbonval closed 9 years ago

btbonval commented 9 years ago

Problem

Right now, the metadata must either be provided by hand using the CKAN specification, or determined automagically by inspecting the API of KerbalStuff (and to a lesser extent Github).

Extent of current solution

To review the metadata which can be extracted using Github (https://github.com/KSP-CKAN/CKAN/blob/master/Spec.md#kref):

KerbalStuff provides additional metadata that the authors type into KerbalStuff explicitly:

Modules which support KSP-AVC .version files can extract a little more metadata:

That raises a very simple question: why not support a file format for authors to supply metadata directly inside the portable module? Similar to .version, but including additional relevant metadata. Call the file ksp-mod-metadata.txt for discussion. Here I am suggesting a file format that is meant to go inside modules; it could possibly take on a life outside of CKAN. This would ideally not be JSON, because that language is too rigid for regular humans.

Open up the hood and look inside

When given a new module, netkan could introspect the contents of the zip file and look for hints from ksp-mod-metadata.txt, or it could look for license.txt (very common in the root of a directory) to compare against known licenses, and so on.

Github

For github, looking inside the release can almost certainly be done via the API without downloading/extracting a zipfile. Here is a rough draft of that process.

A list of released versions can be found with this call: https://developer.github.com/v3/repos/releases/#list-releases-for-a-repository

Netkan already makes use of the Release API call: https://github.com/KSP-CKAN/CKAN/blob/db8bfcd52f0d5f4f2a0e4843c7a09796fa5c9676/CKAN/NetKAN/Github/GithubAPI.cs#L48

Once the tag-name is extracted (not presently done in https://github.com/KSP-CKAN/CKAN/blob/72bdd02da421cdfb9a1e0913c3ab8f9e199445ff/CKAN/NetKAN/Github/GithubRelease.cs), the tag-name can be built into a git ref as refs/tags/{tag-name}. This would be used to browse the files and contents directly on Github via the API.

To list files in the directory, it looks like tree is the right way to go. However, I don't see any way to specify a git ref. https://developer.github.com/v3/git/trees/#get-a-tree

Whether or not you can list the files, the URLs for file content can be found for a given ref parameter of refs/tags/{tag-name}. There is a special hook for getting the contents of the readme, otherwise the filename must be specified (such as license.txt or ksp-mod-metadata.txt). https://developer.github.com/v3/repos/contents/#get-contents

The file contents aren't initially returned. The contents would be read from downloading the link given in either [_links][self] or [_links][html] which are returned from the get-contents JSON response.

btbonval commented 9 years ago

npm has a nice portable file type (ignoring that it is strict JSON) which allows the author allows the author to include a bunch of other metadata for npm package indices. It also allows the author to list dependencies inside the npm package, which we could do, or we could continue to keep that information separate.

package.json is meant to live inside the distributable package, not on some external website or repository. Package indices parse the file in the packages that they web crawl or maintain locally, and then they build their own metadata as they see fit. https://www.npmjs.org/doc/files/package.json.html

yaml might be a little more user friendly as it is less machine strict. Quotes are a bit more willy nilly (a human friendly concept), there is no restriction on the presence or absence of trailing commas, and other less rigid features than json. http://yaml.org/spec/1.2/spec.html

Wizarth commented 9 years ago

The only issue I want to raise with this idea is: I think the project has picked up a lot of traction quickly because it doesn't impose additional requirements on the mod authors. With a suggestion like this, it puts said burden on authors to support "yet another" metadata format.

While it would be optional/recommended, someone new coming into the project could very easily see it and think it's mandatory.

On Sun, Nov 16, 2014 at 3:59 AM, Bryan Bonvallet notifications@github.com wrote:

npm has a nice portable file type (ignoring that it is strict JSON) which allows others to list dependencies inside the npm package, but includes a bunch of other metadata for npm package indices. https://www.npmjs.org/doc/files/package.json.html

yaml might be a little more user friendly as it is less machine strict. Quotes are a bit more willy nilly (a human friendly concept), there is no restriction on the presence or absence of trailing commas, and other less rigid features than json. http://yaml.org/spec/1.2/spec.html

— Reply to this email directly or view it on GitHub https://github.com/KSP-CKAN/CKAN/issues/377#issuecomment-63179267.

btbonval commented 9 years ago

@Wizarth I agree with those points, although my perspective is slightly different.

Most of these mods are open source. Folks like us can do all the work we're already doing and submit a PR. Instead of a PR to netkan, it's a PR to the original project. The author would opt-in but might not necessarily need to do that up buy-in.

My suggestion primarily covers the sort of information a mod author might submit to KerbalStuff but in a portable format that travels with the mod's release and source code.

The dependencies and other stuff would be a bonus, of course, but I think a first round buy-in would focus on just the project metadata that's easy-peasy, like project name ;)

pjf commented 9 years ago

And thus we come full circle!

From the very first commit:

  • The meta-data file should be included in the distribution whenever possible.

The earliest netkan distributions simply watched for new releases, downloaded them, and inspected the embedded metadata.

Indeed, the current spec still permits this:

  • The meta-data file may be included in the distribution, to facilitate easier indexing. CKAN files may be placed anywhere inside a distribution.

That's right. If you put a .ckan file anywhere in your distro, we'll use it¹.

We tried this for a while, and it was awful. Authors didn't want to do even more work to make their mods compliant with Yet Another Metadata Format, and when they did we discovered is this enormously increased the rate of errors and the overhead of maintenance. For authors to change their metadata, they'd have to make a new release, and without strong author-tools to write and test the metadata for them, it was invariably filled with bugs. We already see this with AVC; of all the mods I've downloaded that have .version files, more than half of them aren't even valid JSON documents, let-alone compliant with the AVC spec.

My suggestion primarily covers the sort of information a mod author might submit to KerbalStuff but in a portable format that travels with the mod's release and source code.

For this, I would suggest the AVC format or an extension thereof, or using a variant of the NetKAN or CKAN metadata format, which we already quietly support.

For github, looking inside the release can almost certainly be done via the API without downloading/extracting a zipfile.

Sorry. I really wish this was the case. But the assets that an author attaches to a release are often very different to what's in the source tree. They contain compiled binaries, rendered assets, and all manner of other things. I've seen cases where mod authors use github for releases, but not source control.


¹ Technically the NetKAN will use it. The earliest ckan clients would only use embedded metadata, but that's now been completely disabled, not least because it prevents multiple documents from referring to the same underlying assets, which is very common in split packages. CKAN.NetKAN.MainClass.ExtractCkanInfo is where the extraction occurs.

pjf commented 9 years ago

Just a heads-up in the interests of making sure we don't have stale tickets floating around, I'll be closing this in a day or two unless there are further discussions.

btbonval commented 9 years ago

Well for my own part, it turns out that a hidden feature already exists which suffices for this ticket (ckan files are introspected as part of a module). Although I still think JSON is very unfriendly to anyone but hardened programmers with its rigid syntax, that would be a very different issue.

I'll close this myself since I opened it. ;)