A truly decentralized approach

asterite commented 9 years ago

Hi @ysbaddaden !

We have an idea for a truly decentalized approach for dependencies. Please read this. What do you think?

ysbaddaden commented 9 years ago

I agree with most points.

I'm not sure about the _release branch and having a project.yml and many v0.0.0.yml files. I think the current approach of having a single project.yml in the repository is enough. We can quickly cat any version with git show <refs>:project.yml as soon as we cloned the repository (even bare repositories, and shallow clones may work since Git 1.9).

Apart from that, most of the points are implemented, except for conflict resolution (that is hard, and thus avoided for now) and the locked file.

The require change of also searching foo.cr into foo/src/ would be very welcome, thought we'd may expect foo/src/foo/ too. And please add lib to CRYSTAL_PATH until libs gets deprecate!

The decentralized approach is good. I'm suggesting a registry that would be a cross between bower (searchable list + directs name to repository URL) and NPM/Rubygems: have a nice website, listing shards with metadata, the available versions, the README, links, etc. It may become useful to Shards, like allowing to define dependencies as a slick library: version, or maybe to download the list of versions + their specs to avoid having to clone the whole repository (install would then download using git-archive or github tarballs).

Yet, I'm designing the locked dependencies to be self sufficient so it never has to hit the registry (ie. it would expand the library: version and be decentralized).

ysbaddaden commented 9 years ago

Anyway, the registry would need a complete Web framework (I'm working on that) so it won't come soon.

vyp commented 9 years ago

I'm all in favour of a decentralized approach, but I don't understand why there has to be a separate _releases branch for this? i.e. Why not just put the metadata in a file such as project.yml or something? What sort of stuff goes into the _releases branch? At the least I'd hope there's no binaries there, only text content. I definitely would not want to vcs not-text files.

Why not just checkout a version number tag and use the project.yml from there? A tag can be applied to any branch if the desired release is not on the default (or master, whatever crystal defaults to) branch.

I also don't like it because it forces users to use git, instead of some other vcs. (I personally only use git, and will probably continue to do so forever.) Although using (git) tags also forces users to use git though, unless other vc systems also have the same or similar concept. But then you'd have to write in different vcs logic, and then even just knowing what vcs to use for a package is another big issue altogether. I don't even think shards supports other vc systems although I might be wrong.

But I mainly don't like it because it seems to unnecessarily meddle with my git branches (even if it's named _releases which I will probably never use). I don't want some useless branch cluttering up my git, even if it's only one branch. I don't know how git works internally very well, but I wouldn't want this branch to take up more objects than there would be without it. That would suck.

It seems to force a particular git workflow (or does it)? How does it work with some popular git workflows like git flow (which already has a concept of releases)? Does it just branch off the current branch? Does the stuff in this _releases branch, whatever it is, have to be merged back for the next release?

Does any other language do anything like this? Because I haven't seen it/heard of it, but perhaps I just don't know about it. Maybe it's not as big an issue as I think, but I don't understand why it's done.

asterite commented 9 years ago

@ysbaddaden @vyp The _releases branch will only contain a single file, say, dependencies.yml, which will list, for each version of the library, all of its dependencies. For example:

0.1:
  webmock.cr:
    github: manastech/webmock.cr
    version: ~0.0.2
0.2:
  webmock.cr:
    github: manastech/webmock.
    version: >=0.3.0
  timecop.cr:
    github: waterlink/timecop.cr
    version: >=0.1.0

In this way to know all the dependencies of a given library you only need to check out that small branch. If you don't have dependencies for each of the versions you can't solve conflicts in a fast way: you'd have to check out all tags to see which matches the current set of restrictions.

About this:

We can quickly cat any version with git show :project.yml as soon as we cloned the repository

But that means you have to clone the entire repository, instead of just checking out the version you are interested in. With the approach we propose you only have to check out one small branch and then the version you need.

And @ysbaddaden, the problem with having library: name is that if somebody creates a mysql library and then they abandon it, or maybe it's not the best one out there, then sorry, mysql is taken, you have to use another name. That's why we don't like a global namespace for this. Also because of this you start having strange names. In the beginning I thought mysql2 was a wrapper around... mysql version 2?? Mmm... no, probably mysql is already taken.

@vyp I don't think any other language does this. Julia does something similar, but different: it has a repository where metadata for all existing libraries is listed. If you want to add a library, you send a pull request to that repository. Want to release? Send a pull request to that repository. It works, but somebody has to manage that repository, and of course there's moderation (for example they choose good names before adding something, and they get a chance to discuss it in pull requests). So it's not descentralized, but they don't need infrastructure for the central repository because they just rely on GitHub (smart :-)).

And for Ruby there's rubygems, and bundler downloads a huge file with all version metadata for all existing gems.

Here we also try to rely on github, but without any kind of centralization, so you can release whenever you want, and you can choose any name you want... but while writing this sentence I just realized that if I choose "foo" and somebody else chooses "foo", and our libraries have totally unrelated functionalities, then nobody can use both libraries at the same time. It's a maybe uncommon case, but it can happen. So maybe global names isn't such a bad idea after all.

vyp commented 9 years ago

But that means you have to clone the entire repository, instead of just checking out the version you are interested in. With the approach we propose you only have to check out one small branch and then the version you need.

And the problem with using the --depth switch is that I don't think there's any way to actually know what depth to use, right? So I think this can work. Forgive my ignorance, but this means there is a way to clone a particular git branch right? Without the rest of the repository. Is it just:

git clone -b _releases <url>

The _releases branch will only contain a single file, say, dependencies.yml, which will list, for each version of the library, all of its dependencies.

I think this is great because it can work easily whilst being entirely disconnected from everything else (all the other branches). You never have to do anything with the branch, no merging or anything, meaning all the things I talked about related to conflicting with git workflows should not be a problem.

but while writing this sentence I just realized that if I choose "foo" and somebody else chooses "foo", and our libraries have totally unrelated functionalities, then nobody can use both libraries at the same time. It's a maybe uncommon case, but it can happen. So maybe global names isn't such a bad idea after all.

Can we use a project.yml file (not in _releases branch, just in the root directory or something of all the normal branches) to handle such conflicts? So say when downloading and checking the dependencies, crystal finds a possible conflict here, it could abort installation/compilation and tell the user to specify what names they want to use in some project.yml file or something? (Provided require has a way to change the name used? Like import <module> as <name> in python.)

asterite commented 9 years ago

@vyp You can change the name in project.yml, but there's no such thing as renaming on require, because a require might include many types, additions to existing classes, etc.

On second thought, maybe conflicting names won't be a problem because it would be strange to, say, use two different mysql packages in a project. And you can always search for existing names in case you want to avoid conflict at all cost. De-centralization might be worth this little inconvenience.

ysbaddaden commented 9 years ago

The _release branch would merely be to speed things up, without the need for a central registry. Why not. I'll check shallow clones with a recent version of Git (client and server) and verify how it behaves.

@vyp we shall support other VCS than Git, yet other resolvers could behave in the same way.

vyp commented 9 years ago

@asterite In that case, I can't really think of a way to solve it. But as you say, I also think decentralization is still worth this. Perhaps it would be useful to mention this somewhere in the docs, and to tell people to search the internet or popular hosting sites like github, bitbucket, gitlab etc. before deciding on a name.

sergei-kucher commented 9 years ago

I just realized that if I choose "foo" and somebody else chooses "foo", and our libraries have totally unrelated functionalities, then nobody can use both libraries at the same time. ... On second thought, maybe conflicting names won't be a problem because it would be strange to, say, use two different mysql packages in a project

However I think it can be a problem. For example, one mysql package can be just a wrapper around SQL queries. At the same time another package with name mysql can be a driver for some ORM. And why not to use both?

To resolve such conflicts we can rely on approved by years of github existence approach - user name.

It can works this way:

When no name conflicts presents, all works as usual. I.e. package code can be found in lib/package_name/src and required as require "package_name"
When name conflict occurs, locations for clashed packages code will be changed to lib/user_name/package_name/src. So it's not a problem to require them as require "user_name/package_name"

vyp commented 9 years ago

@sergey-kucher Except because of decentralization, not everything will be on github, what then?

But that's still a good idea in my opinion, perhaps it can make a case for github.com urls specifically, where it falls back to usernames, because github is so popular. (And perhaps the same for other popular source code hosting sites if they have the concept of usernames.)

jhass commented 9 years ago

The conflict is not only on the filesystem level, but more importantly on the module namespace level, there are no namespaced requires and by conventions both should use module Myql as namespace if they're called mysql.

sergei-kucher commented 9 years ago

@jhass Maybe resolving conflicts on namespace level in similar way is quite acceptable.

I.e.

Usual namespaces is ::PackageName
For clashed packages it is ::UserName::PackageName

jhass commented 9 years ago

How do you handle dependencies then? Say I have foo/orm depending on foo/mysql and bar/migrator depending on bar/mysql. Both just reference Mysql of course and both just require "mysql" of course.

jhass commented 9 years ago

Also see https://github.com/manastech/crystal/issues/265

ysbaddaden commented 9 years ago

Discussion moves to https://github.com/manastech/crystal/issues/1357

jtarchie commented 9 years ago

Highly against a decentralized system.

For naming conflicts as described above. If multiple things have the same name, even by file, it could make path loading difficult when a require happens.
Centralization can be built to guarantee authenticity and signing of a library. How would you guarantee in a distributed system that a packaged dependency had not been tampered with. You would have to enforce a convention for everyone who hosted their package to followed. It would be better to provide the tools for them not to worry about it.

For example, what if the one dependency is hosted on HTTP and not HTTPs. Some policies for companies would like all downloads to come from HTTPs for security and auditing.

Rubygems has gone through the problem and finally after many years and community feedback resolved on an important issue. First it was github that hosted all the rubygems, gemcutter, and now current rubygems.

Open ended, what is stopping us from following another community's work and conventions?

crystal-lang / shards

A truly decentralized approach #16