Ecosystem: versioning issues

JJ commented 5 years ago

Right now, when you ask for a version hosted in CPAN, you get that particular version. No problem there. That's not the case in the ecosystem. Whatever is in the repo is downloaded.

There's no way to obtain a particular version.
There's no way to tell if what you download with a version in META6.json is always the same.
There's also no way to tell when a new version is released.

I would propose to make tagging of repo releases mandatory, and work with those tags. It's how it's done, for instance, with rakudo. zef would have to be adapted, and there would be some time before every module in the ecosystem has changed, but it's better to do it now while the ecosystem is still relatively small.

AlexDaniel commented 5 years ago

https://github.com/perl6/problem-solving/issues/45

ugexe commented 5 years ago

This is has never been true. One can link to a META6.json with a source-url of e.g. https://github.com/foo/bar.git@j29jr0j2rjasfj or https://github.com/foo/bar/archive/v0.0.0.zip -- there is no need to change anything.

JJ commented 5 years ago

That is not documented. In fact, I don't think there's a single instance of it (no wonder, since it's not documented). Searching through perl6-all-modules does not return an instance of this, far as I can tell (but I haven't done this programatically, so there might be one).
That is less conventional than using a tag. A commit would be equivalent to using a tag (a tag is just an annotation to a commit), but you would still have to find out the commit number. A zip file (I guess a .tgz would be available too), on the other hand, might not be either complete or corresponding to a particular state of the repository.
That accounts for a single version. No way to specify (unless it's not-documented elsewhere) old versions, or to download them.

ugexe commented 5 years ago

Again none of this is true. zef readme shows the various uri formats zef supports. Using tags is flawed — it forces users to use a specific source control in order to use the ecosystem. Finally one is free to put multiple versions of the same module in the ecosystem — some people already have done this even.

ugexe commented 5 years ago

If I open the ecosystem json and grep for .zip the first result is this: https://github.com/araraloren/Getopt-Kinoko/archive/v0.3.5.zip — A versioned zip file. If they wanted another version shown they could (and probably did at one time) add another entry for the separate version... something @tbrowder used to do years ago.

tony-o commented 5 years ago

I'll second the @ugexe's tag sentiment, we argued about that for a few weeks when I first started looking at rewriting CUR and when niner wrote the CUR/precomp++ stuff. Versioned archives (same as cpan) seems more sane/flexible.

JJ commented 5 years ago

URI has been recently upgraded to 0.3.0. URI is in the ecosystem, which means that old versions are no longer available. They simply aren't. The only available version is the one that's in META6.json, implying whatever is in the repository together with a META6.json file that has the latest version on it. That makes that version-locked META6.json files, such as this one simply fail. Let's not argue about the merits or demerits of version-locked dependency listings. Someone might not know what's going to go in the next version, so they lock their dependency. That makes a lot of sense for production systems, for instance. However, with the current module versioning system, you can only do that if modules reside in the CPAN ecosystem. Meaning, also, that you need to worry about where exactly a module you're using resides before listing it as a dependency. On the other hand, tagging releases implies, as in the original post:

The file set is not going to change (-ish, of course you can delete and re-add a tag, but why would you want to do that?). You can associate commits to tags, and you can also compute a SHA code to avoid that from happening.
The repo hosting is going to make a fileset automatically available under several versions.
Old releases will still be available.

This is probably not ideal, but it solves lots of problems, of which the biggest one is probably the difference between the two hosting options. This repo is about problem solving, so this is my proposal for solving this specific problem.

ugexe commented 5 years ago

That doesn't solve anything. The current solution does everything you just claimed it cannot.

URI has been recently upgraded to 0.3.0. URI is in the ecosystem, which means that old versions are no longer available. They simply aren't.

So post the old META6.json from the previous verision into the META.list. How many times must I explain this? Instead you keep insisting that a single META6.json can somehow represent multiple versions just because you add a tag field. Thats a great solution if you want to ignore all the blaring problems (like, uh, when one version has different META6.json contents than another) and technically superior solutions.

Please -- explain how Getopt::Kinoko (which I even mentioned earlier) in the p6c ecosystem with source-url https://github.com/araraloren/Getopt-Kinoko/archive/v0.3.5.zip does not do everything you claimed is currently impossible?

JJ commented 5 years ago

Hi,

El dom., 25 ago. 2019 a las 17:29, Nick Logan (notifications@github.com) escribió:

That doesn't solve anything. The current solution does everything you just claimed it cannot.

Well, then Travis maybe didn't get wind of that, because, as shown, it didn't find the old release, just because it didn't.

URI has been recently upgraded to 0.3.0. URI is in the ecosystem, which means that old versions are no longer available. They simply aren't.

So post the old META6.json from the previous verision into the META.list. How many times must I explain

If you want to download the actual old version, you need to an additional thing if you want it to download the rest of the files that accompanied that version: identify the commit where you did that. There's an easy way of doing that. It's called tagging. It does not require the author to insert new lines in META.list, and someone to accept that pull request.

this? Instead you keep insisting that a single META6.json can somehow represent multiple versions just because you add a tag field. Thats a great solution if you want to ignore all the blaring problems (like, uh, when one version has different META6.json contents than another) and technically superior solutions.

No, I'm not saying that. A tag points to a specific META6.json in a specific commit. It's a shortcut for a commit, and allows anyone to access the file set for that specific commit. Tagging a repo makes available all versions (all commits where you released a new version) for anyone, whether it's used from the ecosystem or not. It's good practice anyway, and that's way tagging is used, for instance, in Rakudo and mostly everywhere in software development.

Even if you don't want to implement it, it's still a good practice we can recommend for people using the ecosystem, or for that matter CPAN.

Besides, I don't see a big problem here. Tagging a repo automatically generates a .tar.gz. Mapping version names to specific tar files, which you can already do in CPAN, would be a matter of adding a line or two probably. And then, additionally, we could hash-sign releases and so on, which we can't do now.

ugexe commented 5 years ago

I honestly have no idea what you are talking about. Like, I keep explaining why what you are saying is a bad idea, and how the current solution solves all the problems you are having. But then you repeat the same things as if I had not explained anything at all. Aggregation/summation of data is something any ecosystem can currently do.

JJ commented 5 years ago

OK. Let's check out this scenario.

A module is in the ecosystem, listed in META.list pointing at the META6.json in master. Let's say it's Foo::Bar:ver<1>
Someone lists Foo::Bar:ver<1> as a dependency.
Foo::Bar is updated to version 2. The META6.json listed in META.list is now at version 2.

Now, question: What happens to people that list version 1 as a dependency?

It will fail. zef will not be able to find it.
zef will remember the commit that introduced version one and download that.
Nothing, because the author will have also listed the META6.json commit in META.list

What's the correct answer? 1. 2 is not known to happen. 3 has only happened in a single case (or maybe two).

What if the commit where version 2 is introduced is tagged as v2? Well, anyone will be able to download the tar file of v1, v2 and any other version. Is it clear now?

ugexe commented 5 years ago

1) As a module author you are not required to list a META6.json pointing at master. DIHWIDT.

2) As a developer it is YOUR responsibility to be aware of your dependencies. If you are ok with using a dependency that lists itself using master then that is your choice.

3) Same as 1 -- DIHWIDT. The module author could list his module in such a way that it is properly versioned; the developer can simply not use a module that is not versioned in a way that gives them peace of mind.

You are conflating an ecosystem representation of a module index with META6.json presumably because an existing ecosystem happens to just use an array of META6.json as is index. But it makes 0 sense for module authors to represent version graphs of prior releases inside an individual META6.json -- any ecosystem can easily infer all this information from INDIVIDUAL META6.json for each version and present whatever index it wants.

For the third time -- Getopt::Kinoko v0.3.5 is a perfect example of how this can all be achieved as-is but without all the short comings / hard coupling of the proposed solution.

JJ commented 5 years ago

El dom., 25 ago. 2019 18:20, Nick Logan notifications@github.com escribió:

1.

As a module author you are not required to list a META6.json pointing at master. DIHWIDT. 2.

As a developer it is YOUR responsibility to be aware of your dependencies. If you are ok with using a dependency that lists itself using master then that is your choice. 3.

Same as 1 -- DIHWIDT. The module author could list his module in such a way that it is properly versioned; the developer can simply not use a module that is not versioned in a way that gives them peace of mind.

You are conflating an ecosystem representation of a module index with META6.json presumably because an existing ecosystem happens to just use an array of META6.json as is index. But it makes 0 sense for module authors to represent version graphs of prior releases inside an individual META6.json -- any ecosystem can easily infer all this information from INDIVIDUAL META6.json for each version and present whatever index it wants.

For the third time -- Getopt::Kinoko v0.3.5 is a perfect example of how this can all be achieved as-is but without all the short comings / hard coupling of the proposed solution.

Can you please clarify what these shortcomings are?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/perl6/problem-solving/issues/72?email_source=notifications&email_token=AAAAD5DBY7HJDJRI4IFR6WDQGKWMLA5CNFSM4IFNYTXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5CWVJY#issuecomment-524642983, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAAD5E5Z74LCHSZGQBA33DQGKWMLANCNFSM4IFNYTXA .

ugexe commented 5 years ago

Can you please clarify what these shortcomings are?

1) Conflates ecosystem index format and META6.json 2) Conflates a distribution and recommendation manager 3) Doesn't sync with reality -- perl6 has version and api 4) Hard coupling of ecosystem to given backend (git)

Getopt::Kinoko:

1) Uses its META6.json to describe the distribution it represents. It doesn't try to tell you what else a given ecosystem may contain / whitelisted / blacklisted etc. 2) Doesn't attempt to influence recommendation for entities not described in that specific META6.json 3) Allows proper searching of version AND api 4) Uses a technique (using source-url) that is backend agnostic, including allowing git revisions

Having to explicitly add versions to any ecosystem is a feature -- its not supposed to be too easy to release stuff that isn't supposed to be released. Use the correct tooling and there are no issues. Create a e.g. bot that does the META.list commit automatically when it detects changes -- don't try to encode this into a single META6.json

JJ commented 4 years ago

I'm rereading through this and I think that @ugexe is basically right. We can make old versions available, if we want to, just by tagging the repo and adding a pointer to that version either in CPAN or in the ecosystem. This becomes, then, essentially a documentation problem, and that's my turf. So if @jnthn does not mind I'm going to self-assign it, and propose a PR with the solution in the near future to close this issue.

jnthn commented 4 years ago

tagging the repo and adding a pointer to that version either in CPAN or in the ecosystem

I'm not sure I understand how this connects with CPAN releases, in that one has multiple versions there just by having uploaded a tarball for each one?

So if @jnthn does not mind I'm going to self-assign it, and propose a PR with the solution in the near future to close this issue.

Please do; I'll review the solution.

JJ commented 4 years ago

Closed with a13560b

AlexDaniel commented 4 years ago

To clarify, there was a relatively trivial PR that was meant to tackle this problem, but it got no reviews and was merged (and then reverted later because nobody really reviewed it). Just wanna point out that if something is so simple that it can be fixed with an improvement to the docs, there's no real need to go through the problem-solving process, just do it.

patrickbkr commented 4 years ago

Some more ideas to be even louder about how to properly use the p6c ecosystem:

We can adapt the ecosystem pull request template to include a big warning, that one should never add a branch reference to the ecosystem. As branches can change.
We can work through the currently published modules in p6c and try to convince as many authors as possible to upload more sensible meta files to p6c.
We can create a Travis configuration that checks ecosystem PRs for files that reference a branch directly and report that as an error. (Is it possible to detect this? It might actually be possible by checking out the given source URL and applying some git-fu.)
We can adapt to recognize multiple versions for the same distribution and show the version number in the list and hide all but the latest version. Currently it seems like one entry is shown for every file in the ecosystem. Ddt for example has 5 results.

JJ commented 4 years ago

Actually, Ddt is doing it right, because it wants all those versions to be available. We don't even need, I think, to recognize that in any way, except to publish a single page for them in modules.perl6.org. But that's orthogonal to the original point of this, which was basically clarify what's needed to do to make several versions available at the same time in the ecosystem.

patrickbkr commented 4 years ago

@JJ Publishing a single page is what I wanted to propose. Ddt was just my example of a module that did publish multiple versions.

You are right about changes to modules.raku.org being orthogonal to the problem discussed here. That just slipped through as I braindumped my ideas of how we can improve the situation.

patrickbkr commented 4 years ago

@JJ In PR Raku/ecosystem#512 there is a discussion underway to maybe repurpose p6c as a test-ground ecosystem and promoting CPAN as the single one repo people should use. Currently absolutely in the brainstorming phase open to go into other directions. I think it's a closely related discussion to this ticket. Can I invade this ticket and continue that discussion over here?

JJ commented 4 years ago

Of course :-)

patrickbkr commented 4 years ago

There was quite some discussion in PR Raku/ecosystem#512. Starting off with the question of how to make p6c more robust and moving on to the greater question of which ecosystem should serve which purpose. Currently CPAN and p6c coexist. One might be recommended over the other, but they are currently meant to serve the same purpose. Now the idea is to give each of the two ecosystems a specific purpose.

CPAN: The ecosystem for releases. You release a new version of a module? - Do it on CPAN.
p6c: A testing ground to put up untested, pre-release software prior to the real release.

Given we decide the above, following are next steps:

Change the documentation to clearly recommend which ecosystem to use for which purpose.
Change Zef to not install modules from p6c by default anymore, only when asked via a --p6c flag or similar.
Rename p6c to something more telling, e.g. "blead". (Please no bikeshedding at this point in the discussion!)
Improve the user experience of the CPAN ecosystem.
- Prettify the PAUSE website. Make it clear it's also for Raku.
- Write good documentation on how to register a PAUSE account and how to use our tooling (Mi6, Ddt, Assixt) to do CPAN releases.
Fix the p6c ecosystem with respect to dynamic URLs in source-url.
- Create tooling to help in the process of creating p6c releases. Ideas include:
- CLI tooling similar to what we have for CPAN (requires explicitly calling the tooling for every p6c release)
- A GitHub hook or repository scanner that automatically creates releases based on some repository hint (e.g. a tag conforming to some format).
- Forbid putting dynamic stuff in source-url.

But before any of the above can start we first need to decide whether we actually want to give our ecosystems separate purposes. I can turn the above into a proposal, but I'd like some feedback on this first. @ugexe, @JJ, @niner, @Altai-man, @nxadm

lizmat commented 4 years ago

Prettify the PAUSE website. Make it clear it's also for Raku.

That, I'm afraid, is not going to happen. Longer term, we have to assume that using PAUSE for uploading Raku modules is not going to be an option anymore.

Write good documentation on how to register a PAUSE account

I think that is actually already well documented. The problem is, there are hardly any PAUSE admins anymore to actually OK the login requests. It basically runs on its own, unattended. And automatic OKing of new logins, has been switched off long ago because of spamming and phishing.

patrickbkr commented 4 years ago

@lizmat I recently (one or two days ago) talked to some people on #toolchain on the perl IRC server. They were quite open to doing frontend changes to the PAUSE website and agreed to the pain points I brought up. They said that as long as I only change the HTML there is not much potential for breaking tools using PAUSE and I'm free to change stuff.

I think that is actually already well documented. The problem is, there are hardly any PAUSE admins anymore to actually OK the login requests. It basically runs on its own, unattended. And automatic OKing of new logins, has been switched off long ago because of spamming and phishing.

Isn't that quite fatal for the Perl community that relies on PAUSE exclusively? - Frightening!

It's only a single datapoint, but my PAUSE registration request went through rather quickly. Same for my later request to register a second account because of my name change.

I was actually told the same auto approval story. But I'm not convinced we can't do anything about this. Building forms that are hard to misuse isn't that difficult in my experience. If we go down the Google-data-kraken route and use reCaptcha it's basically free actually. So why not give it a try? If the fake account registrations drop to near zero for some time, maybe we can think about automating it again.

Am I too optimistic?

lizmat commented 4 years ago

@patrickbkr More power to you if you can get that together!

nxadm commented 4 years ago

@patrickbkr, I find the CPAN model a very bad fit for a young ecosystem where you need to make it as easy as possible to contribute. A layer of administration is something you can afford if you're a big language with a huge ecosystem like Perl. What's needed is a mantra like "do you have a github/gitlab repo and know how to to use git tag? You're done" . The Go modules ecosystem is a good example. Go kept its this easy-contributing model even after it got big, they just added a write-once caching proxy for better reproducibility (next to the practice of "vendoring" dependencies in your app repo).

More importantly, I think that strategically it is the wrong road to take. The idea of the renaming of Perl 6 into Raku was to make it clear to the world that there are 2 different languages. Running your ecosystem on the best-known Perl infra adds to the confusion we're trying to sort. Furthermore, most --if not all-- proposals related to the Perl 7 announcement involve CPAN one way or another (some influential people want even to fork it into CPAN7). The Perl toolchain and CPAN people will have enough work the coming years for a probably very difficult transition. Do you think Raku will be high on their priority list?

ugexe commented 4 years ago

Do you think Raku will be high on their priority list?

Do you have evidence to suggest that they actively refuse improvements / features from a would-be contributor? Or only that they themselves won't do the work for us?

patrickbkr commented 4 years ago

@nxadm The way I currently understand the state of p6c, it's impossible to keep it as simple as it is now. We will have to introduce some mechanism that people will have to trigger one way or the other. The result being that using p6c will not be much easier than using CPAN.

Currently I don't (yet?) perceive much of a "they" and "us" attitude with the Perl toolchain people. Why can't it just be "us"?

Then I don't think they will need to support us much. We are currently only using CPAN as reliable data storage. We have our own modules.raku.org.

I don't want to stomp on your thoughts, though! Can you give an overview of how you imagine our future ecosystem to work?

lizmat commented 4 years ago

Also, what is the state of zeco? https://deathbyperl6.com/zef-ecosystem/

JJ commented 4 years ago

I don't think p6c right now is in worse shape than CPAN. Whatever is submitted to p6c undergoes more testing than what goes into CPAN (which are precisely none). The concept of bleading-edge ecosystem is interesting, however, and we could try and leverage it by simply adding a plugin to zef that would download anything from GitHub or GitLab (golang-style) and install it. But right now, I would go for making p6c better, with more and periodic tests than to just leave it out completely from the default store. Anyway, if we go for CPAN, we should fully go for it. Right now it's basically used for storage. It does not test, it does not search, it does not display in metacpan. So if we really want to use, we would need extensive reworking of its backoffice to make it really work for us...

niner commented 4 years ago

On Montag, 13. Juli 2020 23:15:23 CEST Patrick Böker wrote:

I was actually told the same auto approval story. But I'm not convinced we can't do anything about this. Building forms that are hard to misuse isn't that difficult in my experience. If we go down the Google-data-kraken route and use reCaptcha it's basically free actually. So why not give it a try? If the fake account registrations drop to near zero for some time, maybe we can think about automating it again.

Actually I think PAUSE would benefit from going the same route as rt.perl.org: support GitHub as authentication provider. After all, people will have GitHub accounts already. The RT support was done via auth0 which also supports Facebook and Google authentication as well as it's own minimalistic registration.

nxadm commented 4 years ago

Do you think Raku will be high on their priority list?

Do you have evidence to suggest that they actively refuse improvements / features from a would-be contributor? Or only that they themselves won't do the work for us?

No, I hope my comment isn't understood in that direction. I think the CPAN people have been very collaborative and the problems with the CPAN flow affect Perl contributors in the same way (however Perl is less in need of new libraries compared to Raku).

My point is that Raku support is a niceness by the Perl CPAN people and that Perl 7 will bring a lot of work for these people, already overstretched today. In that situation I honestly think that from Perl's point of view Raku is rightfully not a priority.

nxadm commented 4 years ago

@nxadm The way I currently understand the state of p6c, it's impossible to keep it as simple as it is now. We will have to introduce some mechanism that people will have to trigger one way or the other. The result being that using p6c will not be much easier than using CPAN.

I agree that the manual edition of the META6.json is not the way to go. I think the simplest way to warrant (some) stability is requiring the use of git tags in combination with semantic versioning. (Yes, I am copying the Go model here.)

Currently I don't (yet?) perceive much of a "they" and "us" attitude with the Perl toolchain people. Why can't it just be "us"?

The Perl toolchain people (that I know) are great. As a Perl user, I love they are there and work hard to keep CPAN sane. Like I wrote elsewhere in this thread, their priority is Perl (and the Perl 5 -> 7 roadmap).

patrickbkr commented 4 years ago

There is now a grant proposal by @tony-o to create a new ecosystem. This new ecosystem is aimed at solving the caveats p6c and CPAN has. Proposal: https://news.perlfoundation.org/post/grant_proprosal_raku_ecosystem Some more explanation: https://gist.github.com/tony-o/07fdf8b3a0f364b182e6034131ac224b

lizmat commented 2 years ago

I think this can also be closed in light of developments and #316 . Please re-open if disagree.

Raku / problem-solving

Ecosystem: versioning issues #72