conda-forge / conda-forge.github.io

The conda-forge website.
https://conda-forge.org
BSD 3-Clause "New" or "Revised" License
131 stars 274 forks source link

More "about" metadata #119

Open pelson opened 8 years ago

pelson commented 8 years ago

conda-build now allows more metadata in the about section. We should start making use of it where appropriate.

ref: https://github.com/conda/conda-build/pull/831

In particular we now have:

'home', 'dev_url', 'doc_url', 'license_url', # these are URLs
'license', 'summary', 'description', 'license_family', # text
'license_file', 'readme'
jakirkham commented 8 years ago

Should we? Some of these feel redundant if not unnecessary.

For instance, I have seen license and license_family as BSD 3-Clause and BSD. What more information did we get from this? Similarly, what is the point of summary and description? What does one add that is somehow more valuable than the other? Seems we should just pick one and stick with it. Also, home is often a GitHub repo. Is a dev_url helpful there? This also shows the Readme too. Sort of unrelated, but what does readme do? I see no docs on it (as with many of these) and am a bit wary of forcing people to include this when we don't know what it is or how it behaves (let alone whether it will stay). These are only a few examples, but there are certainly more.

In short, many of these feel a bit noisy to me. Given the behavior of many of them is indeterminate, I don't think we should start relying on them. Not to mention, I don't know that we should be taking a hard stance on including these.

The only exception IMHO is license_file. In many cases, we are legally obligated to include the license and its behavior is clearly documented. That probably should be added to existing feedstocks.

ocefpaf commented 8 years ago

I agree with @jakirkham here. Not sure if those add enough value to justify using them. Most of that information is either redundant or easily found using the current metadata we already have. For example if dev_url is not home already that info is somehow in the download URL (if not then I we are dealing with a very peculiar pakcage :wink:).

The only exception IMHO is license_file. In many cases, we are legally obligated to include the license and its behavior is clearly documented. That probably should be added to existing feedstocks.

:+1:

pelson commented 8 years ago

The only exception IMHO is license_file. In many cases, we are legally obligated to include the license and its behavior is clearly documented. That probably should be added to existing feedstocks.

Completely agreed. We should start doing that more rigorously.

For example if dev_url is not home already that info is somehow in the download URL

Actually, it was dev_url which I wanted the most. There was a feedstock which started adding this metadata to the README.md and I said that it should be templated if defined in the recipe (that was before much of this metadata was available).

ocefpaf commented 8 years ago

Completely agreed. We should start doing that more rigorously.

Add it to the lint? If so that is something I can contribute :wink:

Actually, it was dev_url which I wanted the most. There was a feedstock which started adding this metadata to the README.md and I said that it should be templated if defined in the recipe (that was before much of this metadata was available).

I have mixed feelings about this. Not sure it is worth the effort and most of the time that is redundant, but than again sometimes it is not...

pelson commented 8 years ago

Add it to the lint? If so that is something I can contribute :wink:

Sure. It is only needed in some cases though, right? Only specific licenses require that we distribute the license with the binaries.

ocefpaf commented 8 years ago

Only specific licenses require that we distribute the license with the binaries.

I check if I can resurrect https://github.com/conda-forge/conda-smithy/pull/93 and add this functionality to it.

jakirkham commented 8 years ago

@ccordoba12, mentioned there might be some value in this extra metadata for GUI package managers. I'll let him explain, but figured I'd note his comment where he said as much.

ccordoba12 commented 8 years ago

Ok, so my input on some of these fields:

  1. license_family: This was required by our clients to be able to filter packages more broadly.
  2. description: Used by conda-manager (the graphical conda package manager used by Anaconda Navigator).

I don't know about the others (some of them can also be used by conda-manager).

Pinging @msarahan and @goanpeca about it.

goanpeca commented 8 years ago

Yes the idea was to use them in a UI package manager, where the extra information was helpful for users to get quick access. Yes there is the argument that people should be able to find everything in the homepage, and yes they are not mandatory, but if they happen to exist like in the following example:

'home': www.spyder-ide.org
'dev_url': github.com/spyder-ide/spyder
'doc_url': https://pythonhosted.org/spyder/

Then they are very useful.

Summary is supposed to be a one liner, whereas description is intended to be a longer and more descriptive account of what the package does.

I am not sure about the usefulness of license_url

pelson commented 8 years ago

I had a desire to maintain some of the information similar to that mentioned by @goanpeca. The comment came from https://github.com/conda-forge/qutip-feedstock/pull/4#discussion_r55568494.

jakirkham commented 8 years ago

FWIW I have no problem with people choosing to use this metadata. Also, if your organization has constraints applied to contributors from your organization (e.g. within Continuum) about including this metadata, I have no opinion on that as long as the maintenance burden falls squarely on your organization.

That being said, I do not want to spend my cycles or other peoples cycles trying to add this throughout conda-forge. Nor do I want to spend anyone's (contributors or reviewers) time enforcing these constraints on recipe additions. This simply seems to be lower priority compared to any other problem we happen to be dealing with. Not to mention, I feel there is still a lack of clarity in many cases about what this metadata is to provide and how it is not redundant. Finally, I'm not convinced that information won't simply go stale either because of lack of interest, time, duplication of effort, or other reasons putting into question what value it has.

If you wish to try to address these criticisms, I think the first piece would be to make sure that all of these fields are documented in conda-docs. The second piece would be to allow this information to be drawn easily from other files within the source code. There are probably more ways to the lower the barrier to entry to using these fields. Though I think that is the right way to get people to start using these fields, not just forcing them to do so outright.

jakirkham commented 8 years ago

cc @almarklein

ocefpaf commented 8 years ago

@msarahan and @goanpeca what are the default channel policies on this?

@ccordoba12 once mentioned that this extra metadata is used in the navigator, so it is good to have them, but are you guys enforcing or leaving it as optional?

I am particularly worried about the effort it takes for common users to find and add things like the license file. IMO that should be optional as adding the license file is mostly an upstream problem.

msarahan commented 8 years ago

I don't know what the policy is on missing license files internally. With the other metadata, yes, we require that on internal packages, and anaconda-verify checks it.

ocefpaf commented 8 years ago

anaconda-verify

Sounds like we have a low hanging fruit to improve the linter then :smile:

goanpeca commented 8 years ago

Same here, I dont think there is such a policy to enforce that LICENSE, cause some project simply dont include it and we would need to wait for a new version UPSTREAM. I would say license_file should be completely optional if it is not located on tha base of the repo.

ocefpaf commented 8 years ago

I would say license_file should be completely optional if it is not located on the base of the repo.

Thanks @goanpeca that makes perfect sense. However, about about when it is located in the repo/source? Python packages might have the license in the MANIFEST and copy it to the install directory. In the Linux world, most of the times, the LICENSE file is copied to the data directory ($PREFIX/share). (That is what I meant by upstream problem BTW.)

Just to make myself clear, even though I am OK and I do recommend specifying the license file in the meta.yaml I believe that is, sometimes, redundant.

Also this metadata entry requires an extra effort on people submitting the recipes (browsing the source code/repo, finding the file, adding it to the meta.yaml, etc). Making this a good candidate for an optional metadata in all cases. (Maybe except in those cases where the license clearly states that it must be shipped with the code and the code itself does not do that. I would argue that upstream is broken, but we can fix that with the metadata entry.)

jankatins commented 8 years ago

Also this metadata entry requires an extra effort on people submitting the recipes (browsing the source code/repo, finding the file, adding it to the meta.yaml, etc).

One could argue that a higher entry barrier and more attention to details from the beginning is a good thing in the long run...

Debian has had mass bug fillings for "trivialities" and I think it is better off for it in the long run.

ocefpaf commented 8 years ago

One could argue that a higher entry barrier and more attention to details from the beginning is a good thing in the long run...

That is a fair an important point!

Debian has had mass bug fillings for "trivialities" and I think it is better off for it in the long run.

However, and here is my main concern, conda-forge is not a Linux community! See how small and technical they are. Do we want to be more like PyPI or more like Debian?

PS: I am a packager for OpenSUSE and there are many rules there I would never apply here because that would make conda-forge a niche package distributor instead a robust multi-platform and community accessible one :wink:

jakirkham commented 8 years ago

Same here, I dont think there is such a policy to enforce that LICENSE, cause some project simply dont include it and we would need to wait for a new version UPSTREAM. I would say license_file should be completely optional if it is not located on tha base of the repo.

Personally I would be ok with not including the license if it is not in the PyPI package.*

Would only add that IMHO we should be raising these problems upstream and/or offering a fix. To me this is no different than any other problem we encounter with their software (for instance a build issue that requires a patch). We should be striving to make the software that we package better than we found it. After all we packaged it as it was worth having in the first place. This simplifies our work over the long run and improves the end user experience.

* - Though you could argue we can always copy and paste this into the recipe (as we have sometimes done) so as to be compliant. Maybe we should do some disambiguation from other licenses ( https://github.com/conda-forge/conda-smithy/pull/230 ).

jankatins commented 8 years ago

However, and here is my main concern, conda-forge is not a Linux community! See how small and technical they are. Do we want to be more like PyPI or more like Debian?

I think "accessible to a newcomer" and "technical sound" should be two different dimension (which may influence each other).

PS: I am a packager for OpenSUSE and there are many rules there I would never apply here because that would make conda-forge a niche package distributor instead a robust multi-platform and community accessible one :wink:

In my opinion the "rules" make for a "robust multi-platform" distribution. E.g. it is my believe that debian is a good technical platform because of the debian policy with it's encoded knowledge and learnings and the strict enforcement of these documents.

So to make a up a different "bad case scenario": if there is only a limited set of loose rules, the packages will not play well with each other, resulting in many incompatibilities, resulting in users running away screaming.

To give an example where IMO "more rules" make for a better technical platform: I still think the current "policy" of "pinning" native libs will not scale because currently the knowledge of what works together with what is codified in a script (which will grow horrible if all the libs from a normal unix distribution gets added), most of the pins rely on semver versioning rules (which are not true for all of the libs) and it will blow up each time a real incompatibility is encountered (because it needs a rebuild of all dependent packages). The solution is IMO adding rules to specify the naming of packages (=versioned), splitting packages (multiple packages for header vs libs), how to specify dependencies on what in the dependent packages and so on. This can all get technical solutions (like debian and probably every other linux distribution already has), but basically these are "only" technical implementation of a set of rules which work.

Unfortunately the linter right now is binary it either passes or fails. lintian the linter for debian packages has a more finegrained result: it can both tell you the "badness" of something and also how sure the linter is that this is really a problem. So if the linter could tell one "this recipe would benefit from a description" but not fail the the recipe (or only once add a whishlist bug to the repo), then this is IMO fine.

jakirkham commented 8 years ago

Thanks for the point particularly on degree of badness, @janschulz. Have placed that in issue ( https://github.com/conda-forge/conda-smithy/issues/317 ) so as to better track it.