crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.45k stars 1.62k forks source link

Package / gem system? #220

Closed farleyknight closed 9 years ago

farleyknight commented 10 years ago

Haven't worked on anything crystal in a while! Just wondering if you guys have any plans on doing a crystal package / gem system any time soon?

asterite commented 10 years ago

Hi! Yes, we thought some times about doing a package/dependency manager. We'd like it to not need a centralized repository and mostly use github, but we don't know how to do that efficiently, without having to checkout the whole repository. Or where to get the metadata for a package's dependencies. Or what format to use for file (json, yaml, toml?... ini? ... crystal?).

I think this is one of the most important things we need in order to build larger projects, otherwise it would become a pain to develop things in the language.

However, I'm not sure the language is ready to be rolled out. We are still changing core stuff like the IO api, some syntax details, and probably we'll change some semantic details. At this stage I would say it would do more damage than good. For example in Rust almost every project I download doesn't work. Even some samples out there stopped working and it means a lot of effort to go one by one and fix them over and over again, and so would happen with packages. Not that I have anything against Rust, it's just my experience. I also remember the same thing happened to me with Elixir. Again, just empirical evidence of what could happen, nothing more.

Finally, we want to provide a solid base so that code duplication is almost non-existent. We want to have a good http client in the standard library that supports streaming. We want fast json and xml. Now we also have oauth and oauth2 clients (mostly because we are doing some small projects with crystal at work). And all of these integrate very nicely. I don't want the language to come to a point where you have to choose between four different implementation of an oauth client, or an xml parser, etc. Of course we can't provide everything, but what is "very" standard and commonly used should be in the std.

That said, we can use this issue to discuss about how to implement an efficient and elegant package/gem system. Centralized or not? What language to use for the file listing the dependencies? Will you be able to invoke a custom command for a package, like in Rust, so that you can easily include a project like kostya's http_parser.cr which has C code with it?

And most importantly: what name to use? :-)

vendethiel commented 10 years ago

mostly use github, but we don't know how to do that efficiently, without having to checkout the whole repository

component(1) (js) does something like that: it fetches the repo's component.json and downloads (one by one) each file. You need however to be logged on to github not to get rate-limited

jhass commented 10 years ago

And most importantly: what name to use? :-)

Crate? Only conflict I could find after a quick search is https://crate.io/

without having to checkout the whole repository

from git-clone(1):

--depth Create a shallow clone with a history truncated to the specified number of revisions.

Centralized or not?

I'd vote for gem's style: centralized with configurable, multiple repositories. I think it makes packaging easier, if I look at the "github" style package managers (npm, go get etc.) they quickly are complex enough to force me using them over my system package manager. Another point is that it reduces name clashes, I've seen node libraries with the same name doing a similar thing in a completely different way. Including the github username into the package name again makes system packaging harder and breaks with decentralization since you then depend on github as your central repository.

For the fail-safety part I'd like to see mirroring baked into the core design, ideally setting up a mirror should be as simple as adding a rsync line to your crontab. This would allow us to use existing mirroring infrastructure and for example build a tracking site that also does something like round-robin DNS between them.

A point I'd like to add early to the discussion is providing a package signing mechanism. Basing something upon things like https://keybase.io could yield interesting results.

farleyknight commented 10 years ago

I'm personally a decentralized kind of guy. A centralized server means a core set of people have to maintain it, and I don't think Crystal is quite there yet, as @asterite has said.

I'd rather have something simple that we can throw away easily if it doesn't work. Something simple like a git clone into vendor/ as @jhass suggested. I just feel kind of awkward trying to write code that will become a dependency without some concept of package to tell people to download.

That would give me the most motivation to jump back into my C API generator and a more robust SDL interface, as I would have clear instructions on the README on how to use that package in other people's projects.

Including the github username into the package name again makes system packaging harder and breaks with decentralization since you then depend on github as your central repository.

Git itself is decentralized. People can use bitbucket if they so please. There's no reason why we couldn't use the full URL as the unique package name, with a shorthand name for referencing it in a dependency file. Consider how Gemfile works.

farleyknight commented 10 years ago

BTW @asterite if you're down with a minimalist package system like I described, I'd be more than happy to write the code for it. As long as you're comfortable with explaining what files would be changed.

jhass commented 10 years ago

Git itself is decentralized. People can use bitbucket if they so please. There's no reason why we couldn't use the full URL as the unique package name, with a shorthand name for referencing it in a dependency file. Consider how Gemfile works.

That's rather quoted out of context. My point is that a more centralized approach encourages more unique library names. Since Crystal like Ruby has no caller defined namespaces, this prevents name clashes in case you want to use two libraries with the same name, which happens faster in a fully decentralized approach. It would also reduce confusion about which library you talk about in these cases. For this reason a few git based package systems include things like the Github username into the package name, which is no better than a central repository.

farleyknight commented 10 years ago

Since Crystal like Ruby has no caller defined namespaces, this prevents name clashes in case you want to use two libraries with the same name, which happens faster in a fully decentralized approach.

I've honestly never had this problem! Maybe it is a problem in practice for others. I dunno.

Inkybro commented 10 years ago

I'd just like to weigh in to say that I agree with @jhass on most of his points, although you made some very fair points as well, @farleyknight, and @asterite.

@asterite, nothing is ever perfect, and there are some STD libs in Ruby that have alternative implementations available as gems. For instance, I know that many people don't very much like Ruby's option parser, and opt for some alternative library. I think a selection is a good thing, not a bad one. It might be more confusing to a new-comer, but ultimately will mean that a richer ecosystem results. I still get what you mean, though. I think I may be a more "build it and they will come" kind of guy, as in I don't entirely agree with providing a ton of STD implementations. If someone needs it, they can always build it, right? Obviously, I understand providing the basic core things, HTTP, etc., but if too much time and energy is focused here, the language itself cannot evolve as much, or at least, as quickly. I think focus should be spent on expanding the functionality of Crystal itself, and all but the most common implementations should be left to the community, at least at this point in its development.

I can see exactly why the language is not ready to be rolled out, but that should not hinder progress on some kind of package management system. This would be extremely beneficial for the community around Crystal, in my opinion. Even if it is implemented one way (decentralized) now, and another (centralized) later on. Even if things break down the line (because they inevitably will, one way or the other), that's okay. Ruby has these problems. Many languages do, as @asterite noted in his post.

With all of that said, I can absolutely see how using a decentralized approach might be a better idea for now (it is probably much simpler to implement, much less maintenance required, etc.), but if my opinion had any weight, which it very well may not as this is my first post ever here, I'd say that I'd implement such a system with the eventual goal being to move to something centralized.

I hope this has helped and that I am understanding this discussion correctly.

waj commented 10 years ago

@Inkybro I was about to make a very similar comment. We probably will start with a decentralised approach that would still be useful later (like when you use git sourced gems in a Gemfile).

I'm not sure about the advantages of having a central repository anyway. Maintaining it would require a big effort and it must be really secure and reliable. I also prefer having namespaces (user names?). Otherwise, anyone can reserve an obvious name for a library and maybe later that is not the preferred implementation in the community.

On the other hand for example, I commonly use Puppet modules. The full name for each of these modules is the common "user name / module name". Some users have very good reputation so that's normally taken into account when choosing one module to use.

Still a centralised directory might be useful to make the discovery more easy, but this one can have just pointers to the original repositories and some metadata.

Inkybro commented 10 years ago

@waj Good points.

I actually think I may be slightly misunderstanding the whole centralized vs. decentralized deal. I just think that it should be as easy as running a command (bundle install in w/ Ruby and Bundler). The repositories are automatically fetched and setup.

I like the idea of Puppet modules, I think it makes quite a bit of sense (although not sure if it'd help with name conflicts in the end).

So my end question is, can we really call Ruby's package management "centralized"? I know you can specify more than 1 source, although for any typically released gem you'd simply specify rubygems.org. So while I see how that is centralized to a degree, the option and capability for it to be less so is there.

Can someone help me understand the distinction a little more, or does it sound like I am understanding it correctly?

EDIT: Although I do agree that a centralized repo like RubyGems would need to be very secure, reliable, and would require much maintenance, I also have to say that, who knows, some party/ies may come along at some point in Crystal's development and be more than happy to sacrifice their time and effort to such a task. I certainly think that it should not be ruled out, at the least, but first I think I need to get a better handle on what we mean when we say centralized vs. decentralized.

waj commented 10 years ago

With "centralized" I understand that I must push the library to some central place in order to be available to others. That doesn't necessarily means there cannot be multiple "centralized" locations.

It's true that one can find some overlapping in the semantics of using rubygems or github. In either website you must create an account and you're free to push everything you want inside your space. I think the key point here is using standard and existing infrastructure. Using github also means one could easily replace it with any other git repository.

Inkybro commented 10 years ago

Well, in any case, the library must be pushed to some central place. Even if that were to be FTP or some alternative, it is still going to be in some central place. All the same, I see the benefits in and have no good argument against using GitHub for this, and I like the fact that any git repo can be used. Plus, GitHub is so pervasive these days, this should be a breeze for most.

Ultimately, in my mind, as long as one can easily install dependencies and get a project "ready to run", then the endpoint (GitHub, etc.) is of absolutely no concern to me, personally. Perhaps supporting a variety of options would be nice, but in the end, GitHub is a very good choice, since it is so widely used and understood among the developer community.

I think I get the idea a little more. Thanks, @waj.

trans commented 10 years ago

The main advantage of a centralized approach is that a package can't just up and disappear if its maintainer decides to discontinue development. Also, a centralized system can have safe guards to ensure packages are clean of viruses and such. Plus it provides a central place for people to find and learn about packages.

On the other hand, a decentralized approach gives more freedom to the developer and requires less work --which is very beneficial to a language like Crystal which is in the early stages yet.

I think a good compromise would be to start with a decentralized approach, then later when Crystal is popular enough to support a centralized system, it can be created in such a way as to piggyback off the previous decentralized system. For example, a git plugin can push a release to the central server whenever one creates a release tag. In fact, the central system might just be a collection of git repositories itself.

Inkybro commented 10 years ago

@trans I think something to that effect seems a good approach, too.

weskinner commented 9 years ago

+1 for the name "geode"

0x1eef commented 9 years ago

Why "geode" ? I don't get it.

weskinner commented 9 years ago

Because they package a bunch of crystals together.

bcardiff commented 9 years ago

I thought it was because they have crystals inside :-). I like it.

There is also something called nodules that are geodes without the hollow cavity. And nodules and modules are ridiculous similar. Sadly as a medical term it is not so nice. But could work also.

asterite commented 9 years ago

@zamith suggested the name "shard", and he even built a website (in Crystal!!) that lists GitHub projects that use Crystal.

I think I like it :smile:

In this case shard would mean "a piece or fragment of a crystal", so the definition fits nicely with "a piece of functionality in crystal". It's also just one syllable, so it's faster to pronounce ("gem" has also one syllable).

Maybe "geode" would be something like rvm or rbenv, if a geode packages a bunch of crystals together (let's hope we never need a tool like that :wink:)

kostya commented 9 years ago

what about gem -> crem

ysbaddaden commented 9 years ago

I like shard too!

jhass commented 9 years ago

+1 to shard

zamith commented 9 years ago

I just wanted to point out that the original idea for shard was from @naps62, so credit where credit is due. :)

But as you mentioned, I also like it a lot.

On a sidenote (and we have discussed this before), what about moving crystalshards to shards.crystal-lang.org? Do you feel it is too soon?

/cc @asterite @jhass @waj

JacobUb commented 9 years ago

@zamith If we're playing this game... https://groups.google.com/forum/?fromgroups#!searchin/crystal-lang/shard/crystal-lang/0wurVGB4JMU/yqeMYPCSrAkJ Unless @naps62 suggested it before 20/10/2014, ofc :)

zamith commented 9 years ago

@Exilor Hahaha. Had no idea. Sorry about that, then.

weskinner commented 9 years ago

@asterite I see what you mean. I like shard as well.

naps62 commented 9 years ago

@Exilor oh damn :( so close

sardaukar commented 9 years ago

+1 to shard

Inkybro commented 9 years ago

Hey, guys. Been a while since I piped up here.

So, I still love the idea of a decentralized approach for dependency management. But I have what I think may be a few good ideas, and eagerly await to hear what everyone else thinks of them.

I was thinking it'd be very cool if we settled on some decentralized backend (e.g. git). Users who want to distribute a "Crystal shard" (or whatever they will be called) can use some included binary, perhaps shard push or some command, to "push" their shard up. In reality, this binary and its associated commands would simply wrap git/git-core.

Alternatively, when a user wants to install a library, he or she could use something like shard pull [git-address]. Eventually, we could extend this to have some Shardfile syntax and what-have-you, and an entire package management system can be built around this architecture, which remains decentralized, yet still offers all the really important functionality.

So,

A. Is this a good idea? If so, why do you think? Do you have anything you'd do differently? If not, similarly, why? -- and what do you propose? B. Is this idea even really new, so to speak? In other words, is what I mentioned basically the eventual plan, or is the whole thing still just "up in the air"? C. I heard talk about some piece of current, internal code that handles some very basic dep management. Is the implementation of this code similar to what I'm describing? Where is that code, so I can check it out?

Those are my basic questions and thoughts as of right now. I'm very curious to expand the discussion beyond simple answers. I'd love to hash these ideas out a little further and move this along because I really still think that Crystal'd get along a lot nicer/faster with more support, and I think a good package management implementation is really what is missing to garner some good interest.

Look forward to your responses. Thanks.

LAST MINUTE EDIT: I apologize if I have missed out on any important development or news regarding these matters. I keep as close a watch as I can, but I really have very little time.

fdr commented 9 years ago

Having worked on and seen the guts of some of the centralized dependency management systems (e.g. Ruby's): they're almost always a mess, no matter the size of the programming community, and never attract enough investment of effort to be good. All of them wind up albatrosses.

Modulo lack of version pinning, "go" has the right idea here. Even with some clear defects, the price is right.

trans commented 9 years ago

Just use git tags for releases. It's easy enough to write a tool that can pull down these tags. (I think I have part of one lying around my computer written in Ruby actually.) Since Crystal is compiled we really only need a tool to vendor the dependencies. I don't think there is a reason to "install" them in a central location (though that would be an easy enough option to add too if need be). For binaries, i.e. end user applications, use distro package managers (e.g. .deb) as God intended.

bgdncz commented 9 years ago

I think that Crystal could use the exact same approach as Rust's manager: cargo. I've played with Rust and can honestly say Cargo is like a miracle. That means Crystal's dependency manager can have both centralized and decentralized dependencies. That means efficient version matching (a key point for new languages). That also means Toml, which was specifically created for this type of thing and, of course, a ton of additional features like build scripts for C/C++ dependencies etc.

trans commented 9 years ago

Please no Toml. It was not "created for this type of thing". It is actually pretty lame for this type of thing. It's just another superset of INI. I'd much rather see someone take the time to create a "RubyML".

As for everything else about Cargo, I haven't used it, but it sounds good.

bgdncz commented 9 years ago

@trans Trust me, it's just perfect and it's very simple, so writing a parser won't be hard. It's also very easy to read :)

refi64 commented 9 years ago

IMO, one of the best package managers I've seen is Nimble (I also like Felix's Scoop, which has a very similar model, but that one is unmaintained). Basically, it uses Git (and Mercurial) to manage versions with a big JSON file. To add a package, it's simply added to the JSON file. To update the local package list, the Git repo containing the JSON file is updated. Packages all have a Git/Mercurial URL. To get a specific version of a package, Nimble just checks out the tag corresponding to the version (e.g. it'd get the tag v0.3 for version 0.3). The package list is on GitHub. Developers to hand-edit the JSON file and send a PR to the package repository. This basically means that you get the benefits of centralized and distributed hosting.

A twist would be that the package list is not a file at all! Instead, the package repo has a directory named packages. When someone submits a new package via the PR, they create a file in the packages folder, <package name>.<some extension>. That way, instead of parsing a huge JSON file, the package manager would just use the directory structure!

To me, this is sufficiently simple for developers (just submit a PR to add your package and create tags for the versions) while still being simple for the maintainers (you basically just merge PRs!).

jhass commented 9 years ago

One long-term issue with that approach is the blockchain issue: It'll grow indefinitely and there's no (sane) way to reduce the size of it ever again. And I can only get the whole copy of the index.

refi64 commented 9 years ago

@jhass What do you mean?

jhass commented 9 years ago

Git doesn't support partial checkouts and shallow clones are only useful since 1.9.

refi64 commented 9 years ago

@jhass Don't lots of other package systems make you download the whole package list every time, too? Take Haskell's Cabal.

refi64 commented 9 years ago

@jhass Actually, not every time. Just whenever you run cabal update.

jhass commented 9 years ago

Yes, though that doesn't mean it's a good thing, in fact it's what made for example Bundler so unbearably slow that they added a custom API to rubygems.org for it. And that was using a gzipped YAML representation of the index, not a whole git repository.

refi64 commented 9 years ago

@jhass Oh.

Well, the rest of the idea would still work...

asterite commented 9 years ago

Nimble or Cargo could work fine, I think. We just didn't have time to implement this yet...

One thing I'd really like to have is recursive dependencies and dependencies checks, like bundler. Do Cargo or Nimble do this?

By the way, I think the current Projectfile, although nice, is not very handy for this type of things because it's a Crystal file so it has to be compiled. Imagine doing that for every dependency (well, shouldn't be that slow, but parsing JSON or TOML should be faster). And also, we already have a TOML parser, but I'm not sure I'd use it. I'd prefer JSON (even it's a bit more verbose). I don't think TOML can be mapped as easily to a data structure as JSON (i.e, like json_mapping).

refi64 commented 9 years ago

@asterite What do you mean by dependency checks? I know Nimble downloads any needed dependencies.

refi64 commented 9 years ago

Also, I can toy with writing a package manager over the week, but I'm not giving any guarantees...

asterite commented 9 years ago

@kirbyfan64 I mean something like this

Of course, if anyone wants to slowly write a package manager we can later integrate it in the compiler if it's good enough. And right now almost anything is better than the current crystal deps :-)

One thing we do like is to have a project's dependencies in that project's directory, without a global directory where all dependencies go. That makes it easy to remove a project's dependencies when you delete a project (just delete the project's dir) and also to browse a project's dependencies source code (it's right in that directory).

trans commented 9 years ago

Why do I need to trust you? I know all about Toml. It's not at all perfect. Hipster is what it is -- all the new kids are doing it. Blah. It's junk.

bararchy commented 9 years ago

I also do like TOML, its easy, its readable, and its hashes :)

Tough, I really don't care which parser ends up managing the packages TBH

trans commented 9 years ago

Ryan + Ary gave me an idea. What if a crystal project had a directory in it, call it dep or deps (or whatever), and in this directory you added a file for each dependency. e.g.

dep/
    json.dep
    clik.dep

Then each file contains a single line with a git repo reference and a tag or branch ref constraint. A command line tool can look at these files and check each out right into the dep directory. It then recursively checks the dep directory of each of those and does the same.

There are two ways that recursive behavior can go down, and it depends on what Crystal is capable of handling (or made to handle).

In the first case, if Crystal can only handle one version of a given package, no exceptions, the tool has to look for version conflicts. It might be able to do this by always checking out the repos to the top level dep directory and checking out the most recent compatible version (MRCV). If there is no possible MRCV, then of course a version conflict error arises.

In the other case, perhaps Crystal can handle multiple versions of the same package? If my package uses version 4 of some library, but another dependency of mine use the same package but version 3, is there any reason that they can't just use their own respective versions and compile without conflict? (Of course it's a good idea for the tool to tell me about the version difference, at least.)

One other thing this setup would allow: build and test dependencies, or any other group for that matter could be in a subdirectory. e.g.

dep/
  json.dep
  test/
    cryunit.dep
ysbaddaden commented 9 years ago

I started designing something this morning. Let's see where it goes.