NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
16.44k stars 12.93k forks source link

Feature request: new meta attr for source code repository #293838

Open lolbinarycat opened 3 months ago

lolbinarycat commented 3 months ago

Problem

most packages use meta.homepage to link to the source code repository, but for projects with an actual homepage (such as luajit), finding the source code often requires clicking around various links to try to find the official repo.

for packages that use fetchFromGitHub or similar, you can usually piece the url back together with some effort, but for projects that use a source tarball (again, like luajit), you're out of luck.

Proposal

a meta.repository field that points to an http-browsable source tree.

for packages without a separate homepage, you could just set the meta.repository field instead.

Checklist


Add a :+1: reaction to issues you find important.

SuperSandro2000 commented 3 months ago

There is meta.downloadPage which is documented https://github.com/NixOS/nixpkgs/blob/5bfab70cdf63ad75f1d7d0facc59fb9f49668811/doc/stdenv/meta.chapter.md#downloadpage-var-meta-downloadpage

lolbinarycat commented 3 months ago

a download page and browsable source tree are entirely different things. a package that has a dedicated homepage is also very likely to have a dedicated page for downloading binaries.

AndersonTorres commented 3 months ago

most packages use meta.homepage to link to the source code repository, but for projects with an actual homepage (such as luajit), finding the source code often requires clicking around various links to try to find the official repo.

So, what? This is a decision from the upstream maintainers to not provide an easy link to their CVS repos. Ask them for an easier-to-find link.

Indeed there are many times we find the opposite: the software has only the repository as its homepage. E.g. I keep bqn-mode in github and never set a dedicated homepage for it.

lolbinarycat commented 3 months ago

This is a decision from the upstream maintainers to not provide an easy link to their CVS repos. Ask them for an easier-to-find link.

same logic can be applied to say that meta.downloadPage and meta.changelog shouldn't exist.

project homepages are usually designed for users, not developers, in my experience they rarely put the git repo front and center.

Indeed there are many times we find the opposite: the software has only the repository as its homepage. E.g. I keep bqn-mode in github and never set a dedicated homepage for it.

if you mean packages that don't have a dedicated homepage, and use the git forge instead: yes, i am well aware of that. i mention it in the original post.

if you mean packages that have a dedicated homepage, and use this git forge despite this: as far as i can tell this is incorrect metadata. the homepage should point to the homepage, doing otherwise is just confusing.

AndersonTorres commented 3 months ago

same logic can be applied to say that meta.downloadPage and meta.changelog shouldn't exist.

meta.downloadPage is useful when the download page is wildly different from the homepage. That's useful for automated scripts.

meta.changelog has useful info for package maintainers.

A pointer for an eye-candy webpage showing the files unpacked in the CVS? This is just eyecandy.

And the CVS itself is useful for those who want to contribute or hack with the project themselves. For a package maintainer, this is just downloadPage, albeit it is not an adequate name.

project homepages are usually designed for users, not developers

Yep. Convince them we programmers are more important than those mere users...

Artturin commented 3 months ago

meta.repositories used to exist but was removed in https://github.com/nixos/nixpkgs/commit/33cce15e42e4086ea26b5fc026a2de3ca2e07f29

Aleksanaa commented 3 months ago

I've mentioned the same problem here: https://discourse.nixos.org/t/problems-regarding-meta-homepage-and-link-to-repository/39821. So thanks for raising an issue.

Aleksanaa commented 3 months ago

meta.downloadPage is useful when the download page is wildly different from the homepage. That's useful for automated scripts.

The real download link is always obvious: it's in the src field. This excludes a small number of packages with a download link hidden behind the paywall. But in this case, downloadPage is still useless for automated scripts or package maintainers. In fact, not a large proportion of software packages really use downloadPage.

We often do this instead:

nix-repl> :l <nixpkgs>

nix-repl> pkgs.mailspring.src.url 
"https://github.com/Foundry376/Mailspring/releases/download/1.13.3/mailspring-1.13.3-amd64.deb"

nix-repl> pkgs.go-musicfox.src.url 
"https://github.com/go-musicfox/go-musicfox/archive/v4.3.1.tar.gz"

nix-repl> pkgs.transmission.src.url
"https://github.com/transmission/transmission.git"

I believe the automated scripts do the same, because in this way they even don't have to parse the Nix code themselves. For package maintainers, they can read the code, and figuring out how to bump source versions shouldn't be a problem.

A pointer for an eye-candy webpage showing the files unpacked in the CVS? This is just eyecandy.

And the CVS itself is useful for those who want to contribute or hack with the project themselves. For a package maintainer, this is just downloadPage, albeit it is not an adequate name.

In fact, link to a repository is often much more useful than the downloadPage. We can roughly observe whether this project is in a good development state (I will not define this term here, it is related to the user's personal needs). On third-party code hosting platforms, we can often find the issue area in a fixed location (rather than a random place), so that we can have a general understanding of the quality of the program, submit issues, or find workarounds to known problems.

This also applies to package maintainers (including the maintainer who created this package and some random people who wants to bump the package), since some dependencies are often hidden in the cmake or meson build files, some files may need to be installed manually, and some code needs to be patched to adapt to NixOS. However, unlike the downloadPage, sometimes we are not using the src related to the repository (such as mirror, upstream deb package, or simply download link not including the repo)

meta.changelog has useful info for package maintainers.

For maintainers, code modifications between version tags are far more important than the changelog, because they really directly reflect changes in dependencies and other metadata.

Convince them we programmers are more important than those mere users...

Users and developers are often not separate in the Linux community, especially in a distribution that requires writing code in an unfamiliar language to configure the system. It makes no sense to discuss the needs of the two in isolation.

Indeed there are many times we find the opposite: the software has only the repository as its homepage.

Yes, this begs another question: What exactly is a homepage? In my opinion, a repository is a repository, while a special introduction page is a homepage. Although I did see many repositories with very detailed READMEs, and they may indeed be used as homepages, the repository itself still brings too much distraction. I would prefer that in tools like search.nixos.org, when users click on a link named "homepage" or "repository", they have clear expectations of where they will be directed. This is also a UX issue, but there may be my personal preference, and other people may have different views.

AndersonTorres commented 3 months ago

The real download link is always obvious: it's in the src field.

This is a bit more complicated. The src field does not necessarily tells us what is the newest release. There are some wackos that use the same link and didn't even provide a versioned tarball.

The best example: RIES!

http://www.mrob.com/pub/ries/src/ries.c.txt

I believe the automated scripts do the same, because in this way they even don't have to parse the Nix code themselves.

Many updater scripts are generated by the derivation itself. It's not the shell script that parses Nix, it's Nix code that generates the script.

On third-party code hosting platforms, we can often find the issue area in a fixed location (rather than a random place), so that we can have a general understanding of the quality of the program, submit issues, or find workarounds to known problems.

Again, eye-candy for people interested in hacking/contributing to the code.

meta.changelog has useful info for package maintainers.

For maintainers, code modifications between version tags are far more important than the changelog, because they really directly reflect changes in dependencies and other metadata.

Are you seriously saying that a whole diff -Naur between two releases of Linux kernel, with nothing else, is way more important than the changelog the kernel developers wrote?

Worse, are you suggesting programmers (including but not limited to Linux kernel programmers) are so unorganized that the changelogs they wrote have zero relevance or negative reliability when compared to a machine-generated diff -Naur?

Users and developers are often not separate in the Linux community

  1. Often not is not the same as never.
  2. Usually programmers are smarter than users, and smart enough to figure themselves by being very keen to look for some keywords such as “star”, “contribute”, “submit an issue”, or “development”, or find the GitHub icon in the sidebar.
    1. This is especially true about the users of a distribution that requires writing code in an unfamiliar language to configure the system (Bash?).
  3. It does not change the argument: the upstream webmasters should provide an easier-to-find link for their VCSes to begin with, not the package managers.
Aleksanaa commented 3 months ago

On third-party code hosting platforms, we can often find the issue area in a fixed location (rather than a random place), so that we can have a general understanding of the quality of the program, submit issues, or find workarounds to known problems.

Again, eye-candy for people interested in hacking/contributing to the code.

I don't see what I'm saying having anything to do with hacking/contributing to the code; on the contrary, it's what you would call "normal users" doing on a daily basis.

For maintainers, code modifications between version tags are far more important than the changelog, because they really directly reflect changes in dependencies and other metadata.

Are you seriously saying that a whole diff -Naur between two releases of Linux kernel, with nothing else, is way more important than the changelog the kernel developers wrote?

No; for some software, the author may have a very well-written changelog, including changes in the building method; but for other projects, the diff of the build script and the addition and deletion of some special data files have provided sufficient information for package maintainers. Changes to specific functional code often do not result in package changes such as adding or removing dependencies or modifying the build process, unless a specific error is thrown in the build process.

Worse, are you suggesting programmers (including but not limited to Linux kernel programmers) are so unorganized that the changelogs they wrote have zero relevance or negative reliability when compared to a machine-generated diff -Naur?

To some extent, yes. I've seen nearly all developers who write in great detail about functional changes for their users, but there aren't that many developers who inform the build process or dependency changes for the package maintainers.

Users and developers are often not separate in the Linux community

  1. Often not is not the same as never.

I'm not trying to please everyone. There are certainly an amount of users who will not benefit from it. But this doesn't render it useless.

  1. Usually programmers are smarter than users, and smart enough to figure themselves by being very keen to look for some keywords such as “star”, “contribute”, “submit an issue”, or “development”, or find the GitHub icon in the sidebar.

But that doesn't change the unpleasant process. If we can save them time and energy, then why not?

  1. It does not change the argument: the upstream webmasters should provide an easier-to-find link for their VCSes to begin with, not the package managers.

Yes, logically they should. But what actually happens? We don't always create tools to solve a modeled problem, we also need to solve these problems in reality, especially if we happen to have the ability to do so (I mean, as shown in the PR above, some common fetchers already contain this information).

AndersonTorres commented 3 months ago

Let me be the devil's advocate here.

What is your goal here, by providing meta.repository? It is just an eye-candy view of the VCS? Or is the repository itself?

Will you prefer a webpage ready to be opened in a web browser, like this GitWeb interface for GCC or this mirror of GNU Emacs?

Or the URL for the repo ready to be consumed by Git, like this from GCC - git://gcc.gnu.org/git/gcc.git - or this from Emacs - git.savannah.gnu.org/git/emacs.git?

meta.repository only makes sense if it can be used as input for a git command (or the corresponding for other VCSes).

lolbinarycat commented 3 months ago

@Artturin thanks, i did not know that. i would hope that automatically setting the field from src metadata would make it more useful. additionally, having a single value should make it easier for sites like search.nixos.org to use and present it.

lolbinarycat commented 3 months ago

A pointer for an eye-candy webpage showing the files unpacked in the CVS? This is just eyecandy.

  1. issue trackers exist outside of the source tree. they are not viewable via a git clone, but nearly every git forge links them from the repo page.
  2. cloning the entirety of a large project when all you want is a quick glance at a single file is a waste of time, bandwidth, and disk space.

Convince them we programmers are more important than those mere users...

making drastic changes to the culture of every open source project is a lot harder than adding a few lines of code.

lolbinarycat commented 3 months ago

Yes, this begs another question: What exactly is a homepage? In my opinion, a repository is a repository, while a special introduction page is a homepage. Although I did see many repositories with very detailed READMEs, and they may indeed be used as homepages, the repository itself still brings too much distraction. I would prefer that in tools like search.nixos.org, when users click on a link named "homepage" or "repository", they have clear expectations of where they will be directed. This is also a UX issue, but there may be my personal preference, and other people may have different views.

i already brought up a possible solution to this in issue i submitted to nixos search: if meta.homepage and meta.repository are set to the same value, only show the repository link.

AndersonTorres commented 3 months ago

A pointer for an eye-candy webpage showing the files unpacked in the CVS? This is just eyecandy.

  1. issue trackers exist outside of the source tree. they are not viewable via a git clone, but nearly every git forge links them from the repo page.

Therefore you do not want the source code repository, you want a very specific eyecandy. You are naming it in a misleading way - and arguing for it in a misleading way too.

The "issue tracker" of Linux kernel is a freaking mailing list. Indeed many projects still use them - hell, SourceHut was created a decade ago and they use mailing lists!

Is a mailing list an acceptable value for the meta.eyecandySite you call "repository"? (Don't worry, many of them have HTML-rendered, HTTP-reachable backups...)

  1. cloning the entirety of a large project when all you want is a quick glance at a single file is a waste of time, bandwidth, and disk space.

fetchgit does this and no one complained.

Also, git allows capturing a single file since at least 1.8.

making drastic changes to the culture of every open source project is a lot harder than adding a few lines of code.

Like making all of them using GitHub exclusively?

lolbinarycat commented 3 months ago

Therefore you do not want the source code repository, you want a very specific eyecandy. You are naming it in a misleading way - and arguing for it in a misleading way too.

issue trackers exist outside of the source tree, therefore i don't want the source code repository? i know that's being overly literal, but i don't know what you're actually trying to say.

the issue tracker being available via the repository webpage is not a an official feature, it is simply a nice situational benefit for users.

The "issue tracker" of Linux kernel is a freaking mailing list. Indeed many projects still use them - hell, SourceHut was created a decade ago and they use mailing lists!

ok, so the repository page of those projects simply won't link to the issue tracker. i don't think that's a huge deal.

Is a mailing list an acceptable value for the meta.eyecandySite you call "repository"? (Don't worry, many of them have HTML-rendered, HTTP-reachable backups...)

no, only an http-browsable source tree is an acceptable value, as the documentation states

fetchgit does this and no one complained.

  1. fetchgit puts its files in a garbage-collected, deduplicated, cached file store.
  2. people do complain about nix build speeds being slow. it's the main downside of nix.

Also, git allows capturing a single file since at least 1.8.

additional mental load of:

  1. remembering/looking up the command to do that
  2. reconstructing the git url from the fetcher, or entering a nix repl to evaluate src.gitRepoUrl
  3. deciding which files to download
  4. deciding where to put those files
  5. remembering to remove those files when you're done

as opposed to just typing !nixpkgs PACKAGE_NAME and clicking a few links.

Like making all of them using GitHub exclusively?

what?

this proposal works equally well with sites like gitea, gitlab, and SourceHut. it can even work with custom solutions like git.kernel.org.

95% of software projects have some form of http-browsable source tree, and those that don't can simply not set the field and be no worse off than if the field didn't exist.

rhendric commented 3 months ago

project homepages are usually designed for users, not developers, in my experience they rarely put the git repo front and center.

Then maybe instead of ‘http-browsable source tree’ being the defining characteristic of this new field, we could say that it's for homepages targeting contributors, and meta.homepage is for homepages targeting users? Then there's less confusion over when to use this (for humans who want to interact with contributor-centric resources for the package) versus when to use src (for automated processes to get the current source) or src.gitRepoUrl (for getting an entire Git repository) or meta.downloadPage (for manually downloading the current source or binaries).

lolbinarycat commented 3 months ago

Then maybe instead of ‘http-browsable source tree’ being the defining characteristic of this new field, we could say that it's for homepages targeting contributors, and meta.homepage is for homepages targeting users?

personally i think "http-browsable source tree" is much more descriptive than "homepage targeting contributors", as the latter requires making a subjective judgement on what the intended target audience of the page is (and most pages usually target both)

maybe "a webpage where the package's source code can be viewed" would be easier to understand.

rhendric commented 3 months ago

(and most pages usually target both)

I don't know, if your archetypal example is get-project.com for meta.homepage and github.com/project-team/project for meta.repository, I think it's pretty clear that one page is user-facing and one page is contributor-facing.

And if a project only exposes a Bugzilla instance but not an HTTP-browsable VCS, I'd still consider that to be a contributor-facing home.

lolbinarycat commented 3 months ago

I don't know, if your archetypal example is get-project.com for meta.homepage and github.com/project-team/project for meta.repository, I think it's pretty clear that one page is user-facing and one page is contributor-facing.

And if a project only exposes a Bugzilla instance but not an HTTP-browsable VCS, I'd still consider that to be a contributor-facing home

i think in that case (if it was frequent enough), we would want a separate issueTracker or bugReport field, instead of overloading the meaning of meta.repository.

my goal with this is to make metadata easier to understand, as currently homepage is somewhat overloaded, and often has an unexpected value (eg. the homepage for gforth is set to its git repo, instead of gforth.org)

when you click the url labeled "homepage", you should go to the project's homepage, and when you click the url labeled "repository" you should go to the project's repository.

rhendric commented 3 months ago

Then I guess I don't know why you want this field either, given that src.gitRepoUrl will take you to the project's repository.

lolbinarycat commented 3 months ago

src.gitRepoUrl is specific to git, undocumented, only set by certain fetchers, and not designed to be overridden by package maintainers.

AndersonTorres commented 3 months ago

Therefore you do not want the source code repository, you want a very specific eyecandy. You are naming it in a misleading way - and arguing for it in a misleading way too.

issue trackers exist outside of the source tree, therefore i don't want the source code repository? i know that's being overly literal, but i don't know what you're actually trying to say.

Your intention is to allow only eye-candy forges or forge-like sites, and call them a generic and misleading name "repository"

ok, so the repository page of those projects simply won't link to the issue tracker. i don't think that's a huge deal.

Repositories that are not meta.repositories.

I have not seen such an ironic stance since the C keyword const not meaning constant.

no, only an http-browsable source tree is an acceptable value, as the documentation states

Why such a discrimination (against ugly non-eyecanded repos), I ask again?

1. fetchgit puts its files in a garbage-collected, deduplicated, cached file store.

Was this an argument for or against using fetchgit?

2. people do complain about nix build speeds being slow.  it's the main downside of nix.

In the worst of cases, you can grab the raw git command.

additional mental load of:

1. remembering/looking up the command to do that

2. reconstructing the git url from the fetcher, or entering a nix repl to evaluate `src.gitRepoUrl`

3. deciding which files to download

4. deciding where to put those files

5. remembering to remove those files when you're done

There is a thing called shell script. It can automate a tuckload of boring and forgettable tasks.

Like making all of them using GitHub exclusively?

what?

this proposal works equally well with sites like gitea, gitlab, and SourceHut. it can even work with custom solutions like git.kernel.org.

You are not targetting general-purpose repositories, but a very specific style of site. In other words, you are targetting forges.

95% of software projects have some form of http-browsable source tree, and those that don't can simply not set the field and be no worse off than if the field didn't exist.

  1. Then your arguments about being hard to find those source trees are weaker than mine about multi-valued backup mirrors.
  2. Arguments that rely on percentages and minorities are very funny sometimes.

and when you click the url labeled "repository" you should go to the project's repository.

"But only if its project's repository is an HTTP-browsable eyecandied site" - FTFY.

lolbinarycat commented 3 months ago

Your intention is to allow only eye-candy forges or forge-like sites, and call them a generic and misleading name "repository"

don't tell me what my intention is.

git.kernel.org would certainly qualify for this value, and that doesn't have any of the features you would associate with a typical forge

You are not targetting general-purpose repositories, but a very specific style of site. In other words, you are targetting forges.

once again: stop telling me what i am trying to do.

"But only if its project's repository is an HTTP-browsable eyecandied site" - FTFY.

i don't care about whether a webpage looks good or not. you simply decided that i did and keep asserting that i do every two sentences.

it's 2024, and the world runs on http. i'm not really happy about it, but i'm not going to decrease the utility of my metadata fields out of spite.

AndersonTorres commented 3 months ago

Your intention is to allow only eye-candy forges or forge-like sites, and call them a generic and misleading name "repository"

don't tell me what my intention is.

I don't need - you did it already:

issue trackers exist outside of the source tree. they are not viewable via a git clone, but nearly every git forge links them from the repo page.

Further, the whole "search engine runs on HTTP, then it can't include non-HTTP links" argument plus the examples about multiple purposes of forges meta.repositories...

git.kernel.org would certainly qualify for this value, and that doesn't have any of the features you would associate with a typical forge

Then nothing hinders a Gemini link.

once again: stop telling me what i am trying to do.

Are you targeting general-purpose repositories?

i don't care about whether a webpage looks good or not. you simply decided that i did and keep asserting that i do every two sentences.

Then a Gemini link is acceptable, correct?

it's 2024, and the world runs on http. i'm not really happy about it, but i'm not going to decrease the utility of my metadata fields out of spite.

Allowing Gemini or IPFS links decrease the utility of your metadata fields? How?

piegamesde commented 3 months ago

@AndersonTorres if you are not capable of phrasing your criticism of the proposal in a constructive manner, please consider leaving the discussion. Especially, calling a feature you don't like "eye-candy" in a derogatory way is not okay. Same for the PR thread.

Within the use cases for the feature have been detailed, I have never encountered any repository whose primary forge web site was not reachable by HTTP. Focusing on this is whataboutism and derailing the conversation. It increasingly feels to me like you are filibustering a change you don't like, instead of constructively working towards a solution that fits the needs of those who ask for it while also taking your concerns into account.

AndersonTorres commented 3 months ago

Especially, calling a feature you don't like "eye-candy" in a derogatory way is not okay.

I am not arguing against eye-candy all-in-one forges. I am arguing against refusing other, less typical sites - that are being refused with no reason besides "less than any%" or "they doesn't look cool on that search engine".

And such reasons in a codebase with things like fetchcvs and fetchpijul are less than convincing.

I have never encountered any repository whose primary forge web site was not reachable by HTTP

If that "less than any%" argument is not a reasonable motivation for removing fetchers like fetchpijul and fetchsvn from Nixpkgs codebase, then this is not a reasonable motivation to not allow non-HTTP-reachable forges as possible values for meta.repository.

Focusing on this is whataboutism and derailing the conversation.

This is not whataboutism when the whatabout is happening right now.

Other technologies, older and newer, for reaching source code repositories still exist. Even those that are considered "deprecated", "obsoleted" (typically in a derogatory way) still work fine despite not being popular or hyped.

Further, even that ubiquitous HTTP is somewhat obsoleted by HTTPS nowadays - the web browsers I use scream when an HTTP(-no-S) link is clicked. (Indeed, I could bet a chocolate bar that you have never encountered any repository whose primary forge web site was not reachable by HTTPS.)

instead of constructively working towards a solution that fits the needs of those who ask for it

A solution that allows non-HTTP-reachable links along with HTTP-reachable ones certainly fits the needs of those who ask for HTTP-reachable links.

Why not allowing anything besides HTTP-reachable links? This arbitrary restriction benefits no one.

What reasons were given for such discrimination?

while also taking your concerns into account.

Since you bring this about, summarizing my concerns:

lolbinarycat commented 3 months ago

Then nothing hinders a Gemini link.

honestly, fine. allow gemini, or ipfs, or whatever. it shouldn't be too difficult for users of this field to proxy that over http, i guess.

can we at least say "https links are preferred when available"? but it has to be a browsable tree, no linking to tarballs, that can go in downloadPage if necessary.

piegamesde commented 3 months ago

How about "web links"?

lolbinarycat commented 3 months ago

Not banishing otherwise valid values with questionable reasons like "it does not look good enough on this particular search engine"

i never said this. i just wanted it to be easily viewable in a browser. popular web browsers do not support "gemini" links at the time of writing

Making that meta attribute future-proof

going back later and saying "actually this can be ipfs" is a trivial change only affecting documentaion.

the bug i'm worried about is code assuming it will always be http while the documentation does not specify that.

Based on questionable heuristics like inspecting opaque fields

src.meta.homepage is not opaque.

any derivation can have a meta.homepage attribute. src is a derivation.

it is, however, somewhat unintuitive. do you want me to document it better?

Questionable heuristics that introduced bugs in otherwise functioning code already in the Nixpkgs codebase, yay!

that is a misrepresentation of what happened. the only thing that happened is i evaluated code that does not evaluate.

you should not pass stdenv.mkDerivation a src attribute that does not evaluate.

additionally, the actual package was not broken, the only thing that was broken was a transient value only reachable by ofborg-eval recursing for derivations.

AndersonTorres commented 3 months ago

can we at least say "https links are preferred when available"? but it has to be a browsable tree, no linking to tarballs, that can go in downloadPage if necessary.

Yes.

the bug i'm worried about is code assuming it will always be http while the documentation does not specify that.

For what it matters to Nix evaluator, this is just (a list of?) strings. The search engine user application decides what to do with them later.

any derivation can have a meta.homepage attribute. src is a derivation.

Hum... I believe this is not necessarily true. It's not unusual to point to local files as src in Nix expressions. It happens all the time with parameterized files (just rg substituteAll). And meta makes no sense for a local file. Further, being pedantic, derivations as defined by Nix language manual have only three required attrs: name, system and builder.

Nonetheless, it is better to not "try to be smart".

It is more reliable to let the package writers to populate the meta.repository field with such typical meta.repository = src.meta.homepage snippets instead of introducing complexity at the core of Nixpkgs and being hit by a clash of planets years later (like a transient value only reachable by ofborg-eval recursing for derivations).

lolbinarycat commented 3 months ago

any derivation can have a meta.homepage attribute. src is a derivation.

Hum... I believe this is not necessarily true. It's not unusual to point to local files as src in Nix expressions. It happens all the time with parameterized files (just rg substituteAll). And meta makes no sense for a local file. Further, being pedantic, derivations as defined by Nix language manual have only three required attrs: name, system and builder.

yes, but if you look at the code i wrote, you'll notice it checks whether src.meta.repository exists before adding it as a fallback. the derivation could be missing src entirely and it would still work fine.

not everything will have a src.meta.homepage field, but my point is, if src.meta.homepage exists, it has a well-defined meaning.

Nonetheless, it is better to not "try to be smart".

citation needed

also, making a value default to another value is extremely common throughout all of programming.

It is more reliable to let the package writers to populate the meta.repository field with such typical meta.repository = src.meta.homepage snippets instead of introducing complexity at the core of Nixpkgs and being hit by a clash of planets years later (like a transient value only reachable by ofborg-eval recursing for derivations).

any change has the potential to cause issues down the line. but you know what we do? we fix those issues.

i put a lot of work into those two lines of code and considered a lot of factors, down to stuff like making sure the error message you got when typing meta.repository on a package with the field unset actually points to the right piece of code.

also, the stability of metadata fields has never been an important priority in nixpkgs, which i can tell because i've found a large number of obviously invalid urls just sitting around.

AndersonTorres commented 3 months ago

citation needed

Example: that person hit by a transient value only reachable by ofborg-eval recursing for derivations.

also, making a value default to another value is extremely common throughout all of programming.

Explicit is better than implicit.

- Zen of Python

any change has the potential to cause issues down the line. but you know what we do? we fix those issues.

Avoiding issues by taking preventative measures is usually something we do too. Fixing the issues before they appear - it's a kind of magic!

i put a lot of work into those two lines of code and considered a lot of factors, down to stuff like making sure the error message you got when typing meta.repository on a package with the field unset actually points to the right piece of code.

Hit by a transient value only reachable by ofborg-eval recursing for derivations.

also, the stability of metadata fields has never been an important priority in nixpkgs, which i can tell because i've found a large number of obviously invalid urls just sitting around.

Is this an argument for progressive degradation and uselessness of meta (including to pinging maintainers that didn't touch their forks for years) or for tool-assisted sprints fixing this issue?

Aleksanaa commented 3 months ago

So what is a "transient value only reachable by ofborg-eval recursing for derivations"?

When I saw this sentence, I realized that my English level may not have reached the level of daily communication.

AndersonTorres commented 3 months ago

src.gitRepoUrl is specific to git, undocumented, only set by certain fetchers, and not designed to be overridden by package maintainers.

Ehr, the same can be said about src.meta.homepage.

Indeed all fetchers are underdocumented, and the chapter about fetchers says nothing about meta attributes from src.

lolbinarycat commented 3 months ago

Indeed all fetchers are underdocumented, and the chapter about fetchers says nothing about meta attributes from src

well maybe they should

Example: that person hit by a transient value only reachable by ofborg-eval recursing for derivations.

you seem to think that error is much more severe than it is.

the only thing it caused was a CI failure. it did not cause any package to actually stop working.

Is this an argument for progressive degradation and uselessness of meta (including to pinging maintainers that didn't touch their forks for years) or for tool-assisted sprints fixing this issue?

i'm sorry, how does adding more data to meta make it useless?

i would argue that having a meta field that isn't set by any package would be much more useless.

not having the field at all is the most useless of all.

or would you rather i changed 10 thousand lines instead of 10? because that's what would be required if the field has no default.

including to pinging maintainers that didn't touch their forks for years

how is that relevant here?

rhendric commented 3 months ago

or would you rather i changed 10 thousand lines instead of 10? because that's what would be required if the field has no default.

A third alternative is to place the defaulting logic in consumers instead of the producer.

lolbinarycat commented 3 months ago

@rhendric it's certainly an option, but i'm not sure it's a good one.

instead of having nixpkgs code rely on undocumented nixpkgs behavior (which is quite common, not every obscure utility function has all of it's features publicly documented), you have external code outside the nixpkgs repo depending on undocumented behavior.

by handling it within nixpkgs, if we ever want to change the behavior of src.meta.homepage, all we have to do is update check-meta.nix, instead of telling everyone who uses meta.repository to change how their code works.

AndersonTorres commented 3 months ago

Indeed all fetchers are underdocumented, and the chapter about fetchers says nothing about meta attributes from src

well maybe they should

Then underdocumentation is not exactly a problem for using this or that attribute.

On the other hand, this is not the crux of matter: src should have a meta.repositories field.

you seem to think that error is much more severe than it is.

You asked for an example of error, not for an example of catastrophe.

Nonetheless this error pointed out to something more promising.

Is this an argument for progressive degradation and uselessness of meta (including to pinging maintainers that didn't touch their forks for years) or for tool-assisted sprints fixing this issue?

i'm sorry, how does adding more data to meta make it useless?

Let's get back to the paragraph:

This is an argument for what, exactly? To me, it points to the need of improvement of a bad situation. But the impression you transmitted was something like "don't worry, there is so much mess here that my code will not look so relevant".

or would you rather i changed 10 thousand lines instead of 10? because that's what would be required if the field has no default.

The default of empty list was suggested, wasn't it? Indeed it was precisely what would happen in cases when src has no meta.homepage - with the small problem that, as you yourself said, a homepage is not the same thing as a repository.

This is a bit ironic relying on a homepage that sometimes-but-not-always is a repository in order to set the value of a repository that is not-always-but-sometimes a homepage.

Further, if I am correct on my suppositions, it would require less than a hundred lines of diff. And this being conservative: there are 40 directories called pkgs/build-support/fetch*. If each one has only 3 files each, and each one requiring one line of code, it would be 120 changes.

how is that relevant here?

I was talking about (an example of) metadata maintenance.

lolbinarycat commented 3 months ago

On the other hand, this is not the crux of matter: src should have a meta.repositories field.

for what purpose?

you seem to have a problem with the existing behavior of fetchers, which i think is out of the scope of this issue.

The default of empty list was suggested, wasn't it?

none of the other meta attributes default to null or an empty list, so i would like to keep things consistent.

Indeed it was precisely what would happen in cases when src has no meta.homepage

no, it is not. if no explicit value is given, and src.meta.homepage is not set, then meta.repository will not be set either. trying to access it in this case will raise an error (this behavior is in line with other metadata fields).

Further, if I am correct on my suppositions, it would require less than a hundred lines of diff. And this being conservative: there are 40 directories called pkgs/build-support/fetch*. If each one has only 3 files each, and each one requiring one line of code, it would be 120 changes

hold on, i thought you wanted each package to explicitly set its own meta.repository?

in any case, 120 is a lot more than 15, an i like to keep my PRs small to minimize the possibility of conflicts, and to make them easy to review.

AndersonTorres commented 3 months ago

On the other hand, this is not the crux of matter: src should have a meta.repositories field.

for what purpose?

Two purposes you proposed:

Therefore, src.meta.homepage should not be conflated with src.meta.repository.

Consistency, that's it.

you seem to have a problem with the existing behavior of fetchers, which i think is out of the scope of this issue.

Since you are seeking for consistency, this is not so out of scope after all.

Certainly it will require more coordination and many PRs, but this will not be so hard. After all, meta does not trigger a rebuild.

none of the other meta attributes default to null or an empty list, so i would like to keep things consistent.

  1. They are more or less well-defined and easy to obtain. Rarely a person has doubts about descriptions, homepages, licenses and the like. The same can't be said about meta.repositories. E.g. live555 explicitly has no public VCS. They don't even release older tarballs. For such a case, live555.meta.repositories = []; is not only perfectly acceptable, it's the only acceptable value.
  2. Technically meta.maintainers can be empty, since we do not impose lifetime maintenance on packages.

no, it is not. if no explicit value is given, and src.meta.homepage is not set, then meta.repository will not be set either. trying to access it in this case will raise an error (this behavior is in line with other metadata fields).

What is the value of an unset field?

Remember, you also opened a feature request on the Nixpkgs Search Engine. And parsing a field that can raise an error is a pain in the butt, even when using a memory-paranoid language like Rust.

hold on, i thought you wanted each package to explicitly set its own meta.repository?

Initially this was the only acceptable stance, since the src.meta.homepage is a bad heuristic.

Nonetheless, for some (many) fetchers meta.repositories make perfect sense and it is easy to generate. Picking the most ubiquitous fetchFromGitHub, it is just ${domain}/${owner}/${repo}. Further, it is reasonable to inherit repository from source - since meta.repositories == src.meta.repositories by "definition".

As I have said above, src should have a meta.repositories field.

in any case, 120 is a lot more than 15

If fetchFromGitHub can return a proper meta.repositories (and it can), this return can be used to auto-set 14k meta.repositories. And it will be just one line of code - ${domain}/${owner}/${repo}.

This is a huge save. A huge and reliable save.

I believe showing the code will be more convincing. Let me try.

lolbinarycat commented 3 months ago

Remember, you also opened a feature request on the https://github.com/NixOS/nixos-search/issues/741. And parsing a field that can raise an error is a pain in the butt, even when using a memory-paranoid language like Rust.

but they already have infrastructure for handling that, as other meta attributes are frequently unset.

and remember, indexing an unset field is only an error if it there is no default value

if anything, being inconsistent would probably require more code paths.

Initially this was the only acceptable stance, since the src.meta.homepage is a bad heuristic.

why is it a bad herustic?

This is a huge save. A huge and reliable save.

so, using src.meta.homepage is bad, but if we add a new field that does the exact same thing, that's reliable?

AndersonTorres commented 3 months ago

and remember, indexing an unset field is only an error if it there is no default value

"Unset" is not the same as "set as empty by default".

why is it a bad herustic?

How do I start...

  1. Such a polemic PR with potential to touch substantial and sensitive parts of Nix should be proposed via RFC.

  2. According to you yourself and @Aleksanaa , fields like downloadPage and homepage are less than suitable for the purposes of storing links for repositories. Why should we expect the same will not happen with an undocumented feature? Your own words argue against you here.

  3. Being undocumented, such src.meta feature can be easily discarded, and way more when they don't trigger a rebuild.

    Indeed, why not deleting such instances of src.meta right now?

  4. We know from Artturin that meta fields can be deleted without much hassle and no one will care.

    After all, no one cared to remove meta.repositories field in that long past (miss you, WOP), whereas a single guy ignited a holy war because Nixpkgs nuked a SLAPPed software.

    Further, arguments for removing changelog and downloadPage were given here, in a conversation for adding a new, arguably more useful field.

    On the other hand, adding a single meta.categories field required a whole year.

Using such a sliding quicksandy ground is asking for trouble.

so, using src.meta.homepage is bad, but if we add a new field that does the exact same thing, that's reliable?

Correct!

Because that new field does precisely what it is purposed to do: point to a repository.

Also, because it can be documented in an unambiguous way. No one will need to update meta.homepage in order to say something like "sometimes it point to the upstream home page, but in some corner, undocumented cases it points to a source code repository".

nyabinary commented 3 months ago

Since this got reverted in #300247 and there still seems to be interested in this feature coming to fruition, I'm reopening this issue for now.

rhendric commented 3 months ago

I was mildly skeptical before that computing attributes in meta from other parts of the derivation is a good idea, and now I'm more so. Why are we doubling down on this design choice and telling contributors (in a comment they have to find by following the stack trace!) to add redundant data to their packages if this causes more problems? We could instead let this defaulting logic live in a non-central, non-critical location, namely the one place that OP wants to consume this, and let meta be a dumb source of raw data that can't possibly cause evaluation problems again.

lolbinarycat commented 2 months ago

@rhendric a lot of meta attributes are already computed from other attributes, for the record.

there's several problems with putting this logic outside nixpkgs:

  1. no longer testable via CI
  2. will need to be re-implemented by everyone that wants this data
  3. makes it much more work if anyone wants to change the logic in the future
  4. one of the main ways to access this data is via repl, so "just implement it at the other end" means telling every user to handle it themselves
  5. putting the logic outside nixpkgs doesn't fix the eval errors, it just ignores them (and most consumers, like nixos-search, will have no way of handling them, since there is no way to catch an abort)

unfortunatly the only viable options i see are:

  1. no defaulting logic (would require 1000s of PRs to make the field useful)
  2. put the defaulting logic in nixpkgs (requires some effort to not break eval)

maybe it would be better to instruct package maintainers to simply make their src eval on all platforms (eg. by adding or { })

AndersonTorres commented 2 months ago

since there is no way to catch an abort

Aborts can be catched outside Nix.

would require 1000s of PRs to make the field useful

treewide + shell scripts, like the by-name migration

makes it much more work if anyone wants to change the logic in the future

the current logic is already complicated and based on leaps of faith.

maybe it would be better to instruct package maintainers to simply make their src eval on all platforms

violating a series of suppositions like "don' try to build this outside that platform"? Nah.

lolbinarycat commented 2 months ago

Aborts can be catched outside Nix.

with great difficulty, yes. instead of just doing nix eval --json nixpkgs#somepackage.meta to extract every metadata field at once, you instead need a seperate nix eval command for ever metadata field, increasing the number of required procs tenfold.

treewide + shell scripts, like the by-name migration

link? i can't find any evidence of any sort of automatic by-name migration, and there are still a lot of packages using the category hierarchy.

that seems like it would encounter a ton of problems, from constant merge conflicts to unusual syntactic constructions. this would almost certainly end up being more work than the runtime approach, which has already had most of the required work completed.

even if we did somehow manage that, requiring new packages to specify the field (meaning reviewers need to inform new contributors about it) would still be a lot of work in the future.

AndersonTorres commented 2 months ago

link?

Are you serious? Did you born yesterday?

because https://github.com/NixOS/rfcs/pull/140 is happening right now.

i can't find any evidence of any sort of automatic by-name migration

well, I can: #258650

You're welcome.

, and there are still a lot of packages using the category hierarchy.

because it was issued a recommendation for not to convert packages using the older convention (except in special circumstances), since - surprise surprise - they will be mass-migrated (#211832).

inb4 "link?":

https://github.com/NixOS/nixpkgs/blob/master/pkgs%2Fby-name%2FREADME.md

rhendric commented 2 months ago

@rhendric a lot of meta attributes are already computed from other attributes, for the record.

Yes, but:

  1. no longer testable via CI

More like no longer needs to be tested via CI, because for most packages there'll be nothing to test.

  1. will need to be re-implemented by everyone that wants this data

As soon as ‘everyone’ is more than one nice-to-have link in the search app, I'll entertain this objection.

  1. makes it much more work if anyone wants to change the logic in the future

Fully disagree; I think changing logic in check-meta.nix is much more fraught than changing logic in the search app, because the former is at the root of everything for everyone.

  1. one of the main ways to access this data is via repl, so "just implement it at the other end" means telling every user to handle it themselves

Telling every user who looks this information up in the REPL (and how many of those are there going to be) to check two attributes instead of one is not much of a cost.

  1. putting the logic outside nixpkgs doesn't fix the eval errors, it just ignores them (and most consumers, like nixos-search, will have no way of handling them, since there is no way to catch an abort)

As already noted, addressing eval errors outside of Nix is easier than inside, and consumers can always filter the packages down to a set they're interested in. You're proposing changing what src attributes are required to evaluate under what conditions just so that you can put logic intended to populate a search result field in Nixpkgs instead of in the search app—this is clearly a case of the tail wagging the dog.