bower / registry

The Bower registry
https://registry.bower.io/packages
MIT License
292 stars 66 forks source link

Next-gen registry & Bower architecture #73

Closed rayshan closed 8 years ago

rayshan commented 10 years ago

I'd like to help and get the next-gen registry out. I reviewed previous discussions, and seems like the team and interested parties still need to norm on a general architecture.

State of the union

Note that even though this looks very complicated, it's more of a reorganization of existing parts. Actual work will be focused on building next-gen registry and API. As you review this please keep simplicity in mind.

Drawing link

Decisions to make

New consistent repo / service modules naming convention: bower-server-api (combines below) bower-server-registry (this repo) bower-server-etl (from stats service) bower-server-stats (from stats service) bower-server-user

sheerun commented 10 years ago

I'm pretty sure we should use npm registry for hosting binaries (just like component.io). Rolling our own hosting is not a feasible option, because of budget and resources bower has. We could host URLs to source repository (github) and distribution repository (npm, or other).

I think we should use Postgresql too. Redis has no good indexing or clients.

User authorization would be helpful. As well as mini admin panel with repositories that user manages. I wouldn't use 3rd party solutions unless they are open source.

With user repositories saved, the registry could crawl those repositories periodically.

I'm fine with CoffeeScript or even Ruby for registry server.

rayshan commented 10 years ago

I'm also leaning towards not hosting binaries, they're already online somewhere. 1 less thing to break.

With user repositories saved, the registry could crawl those repositories periodically.

@sheerun what would the use case be? Auto bump bower package version based on git tags? Crawl for bower.json so CLI doesn't have to upload them?

I'll try to pull breakdown of repo hosting services used by publishers, should be a useful stat to have.

Hacklone commented 10 years ago

Hey, thanks for contacting me. Here is my opinion on some of the decisions:

sheerun commented 10 years ago

My more 2 cents:

Hacklone commented 10 years ago

In Ruby you can write everything a 10000 way, in JS you can only in 100, so when working on the same project this makes a big difference. (tons of more best practises needed and the style will never be the same with Ruby :) ) I didn't say that you should use harmony, just said that es6 will be a standard and node harmony implements some nice features (like yield, arrow functions etc...) A lot more people read JS than coffescript (why develop an opensource project with a "language" that will disappear eventually because es6 solves loads of problems why coffescript was created) I'm not against Postgres I just didn't have that bad experience with node :)

I didn't want to offend anybody, this is just my opinion :)

rayshan commented 10 years ago

Thanks @Hacklone for stopping by and your input. What I meant by ease of replication was if someone would like to run something like private-bower. Do you dump all the registry data in json into a local database? CouchDB seems to have super easy ways of replication.

Unfortunately I don't know Ruby, unless if another contributor wants to lead this. I would like to play with Traceur, just more familiar with CoffeeScript.

Mongoose looks great. There are also nice ORMs for Postgres like bookshelf / sequelize.

I noted you guys' votes, let's wait for additional contributors to chime in.

benschwarz commented 10 years ago

In Ruby you can write everything a 10000 way, in JS you can only in 100, so when working on the same project this makes a big difference. (tons of more best practises needed and the style will never be the same with Ruby :) )

No.

benschwarz commented 10 years ago

Thanks for getting all these thoughts / ideas down @rayshan!

Focus on RESTful API-oritented architecture

:thumbsup:

User management

Definitely required. Single 3rd party vendor authentication sucks (because single vendor), multi 3rd party authentication sucks (because management / usability is a PITA)

Why not keep Postgres?

I'm not going to argue the cases for different database engines on their 'merits' (frankly, commentary here from non-contributors should not be welcomed). Using a conservative RDBMS is the way to go. Contributing is easier, management is easier.

Why store binaries?

Right now, there are many issues surrounding http proxies, ssh access, shallow clones on github enterprise. Generally, transport of packages using git isn't working very well. I'd personally prefer to see packages stored as tar/zip/compressed on S3 (or, whatever) with a CDN in-front.

Is it premature to worry about scaling now?

Yes, I think so.

Do we need to worry about ease of replication?

Yes, running a mirror is definitely desirable. Within (larger) organisations, I'm sure that people would like to be able to replicate their own private registries… or the public registry.

At the risk of sounding naive, I think we could solve this via API and a sync-client.

Express vs. Hapi? (@svnlto can you make an argument for this?)

Whatever results in the most simple deployment / API. There should be also be a focus on the size of the contributing communities.

CoffeeScript?

This has been asked in the past. Other parts of Bower are written in Vanilla, I think that the registry should follow suit.

Personally, my preference is vanilla.

Do we still need caching to serve stats?

Depends on the database impact. We'd have to pull together some numbers to know for sure.

CI

Yes, this needs to happen.

No https (bower/bower.github.io#45)

We shouldn't be using HTTP for Bower at all. Definitely need to get this on the priority list.

Thanks Ray! :bird:

Hacklone commented 10 years ago

For the replication it was a definite request for private-bower to be able to cache any registry private or public, so I'm not sure that this public registry should do that too, because there are totally other needs for the organization compared to the rest of the developers.

rayshan commented 10 years ago

Thanks @benschwarz.

Sounds like user management is a definite yes.

For binaries, what we could do is get all the other pieces up and running, then add binaries at the very last. The web admin UI can even accommodate drag / drop with simple registration. Because S3 / CloudFront is only free for a year, we may need to look into a CDN to sponsor.

I asked @hacklone for opinion due to his work on private-bower. Perhaps I jumped the protocol here - my fault.

benschwarz commented 10 years ago

I asked @hacklone for opinion due to his work on private-bower. Perhaps I jumped the protocol here - my fault.

No, not at all.

I think its important to recognise that technologies should be chosen very conservatively, particularly for bower as a project during its current lifecycle. Thats where I was coming from. No harm, no foul.

rayshan commented 10 years ago

Thanks @benschwarz. I completely agree with you.

I went to a talk last night where npm's devops person talked about their stack. You were right that they spent quite a bit of time wrestling w/ couchdb, and their stack is a lot less dependent on couchdb now. I feel like if they had a choice they would choose to move off of it.

Stormpath's dev advocate was also there. I spoke to him about pros / cons of using a 3rd-parth auth solution. He also offered to sponsor for enterprise level package for us.

update: npm infrastructure talk: https://www.youtube.com/watch?v=3ivx2RsZ1yA

sindresorhus commented 10 years ago

:+1: Everything @benschwarz said.

rdegges commented 10 years ago

FYI, I'm a developer @ Stormpath (we're a vendor which does user management). If any of you guys are interested in trying us out / considering us for the bower user management, we're more than happy to 100% sponsor the project for free. Our service is free for most small projects, but as you guys would probably need more than the free stuff, we'd be happy to cover all costs.

We love / use Bower ourselves all the time. It's a big part of our stack.

Furthermore, we have some pretty awesome node integration / express integration that I've been working on, and I'd be happy to pitch in for coding efforts!

Just a thought!

<3333

-Randall

UPDATE: Here's a link to the new express stuff: http://docs.stormpath.com/nodejs/express/ (we're super easy to use / easy to export data out of / and easy to migrate OFF of).

rayshan commented 10 years ago

Thanks guys. Sounds like we're generally aligned on the next steps. I updated the first comment to reflect feedback.

@sindresorhus can you add me to the bower heroku app? Just for understanding env vars, scaling & replicating the db. I'll set up a new dev environment. Won't change anything in production app w/o notifying other owners.

Follow up for @sheerun, I pulled some git hosting stats out of the registry:

Total # of bower registry packages as of 7-30-14 - 17055
github public - 16947 (99.4%)
github enterprise (e.g. github.paypal.com) - 50
bitbucket - 14
gist.github.com - 12
gitorious - 2
beanstalk - 1
code.google.com - 1
gitlab - 1

live query: https://dataclips.heroku.com/byvmrorsycxmclubeuzxtlckegvm

@rdegges thank you sooooo very much for the offer. I love you guys for just willing to support OSS. As discussed offline, I need to understand the pros/cons of using a 3rd party service better so I can make a case for the entire bower team.

Some pros I see so far:

Some concerns:

What I plan to do is to build a little prototype w/ & w/o just to try it out for myself. Let's chat more about this.

sindresorhus commented 10 years ago

@rayshan done.

struys commented 10 years ago

Let me start off by saying, I love the general idea and I'll try my best to have my team help out with future work on the registry. <3

Light on tests & no CI environment

If we're talking about a rewrite, I'd really like to see a coverage tool like istanbul (https://github.com/gotwarlost/istanbul) used. We can save a lot of time merging pull requests if coverage is part of CI.

Note that even though this looks very complicated, it's more of a reorganization of existing parts. Actual work will be focused on building next-gen registry and API. As you review this please keep simplicity in mind.

My primary concern is making sure the registry is easy to setup internally. We're using bower at Yelp because it's a great tool that didn't take a huge amount of effort to get working within our internal network. As a result, we've also been able to contribute back upstream. I think we want to make sure all of these systems are pluggable. Please make sure it's possible to skip the vendor dependencies.

Why not keep Postgres? (keep)

RMDBS is decently performant and well-known by potential contributors. Instead of saving the whole >bower.json, we can just parse it and insert a row in a db table. For simplicity we can use an ORM like >bookshelf. I'm not too familiar with Postgres / MongoDB / CouchDB / ... admin so it's up to the team to >pick one and I'll figure it out.

At Yelp we primarily use mysql and our standard backup is mysql. Since the registry uses postgres, we've setup a somewhat weird git repo for backups (we also have a standard to backup git). It would be awesome if the registry was DB agnostic. Could we use something like sequelize? (http://sequelizejs.com/)

CoffeeScript? (no, maybe ES6 + traceur)

I would like to personally ask the team for blessing on this. I'm most productive writing CoffeeScript, but I understand there may be concern with maintenance and attracting future contributors. There are many high profile OSS projects (like Atom) that use CoffeeScript exclusively. If the team feel >strongly against this I'll stick w/ vanilla js.

My team would prefer to avoid CoffeeScript. Yet another language with questionable benefit considering the overhead.

rayshan commented 10 years ago

@struys thanks for the input & your team's previous contribution to the registry.

coverage tool like istanbul

Yes will look into it, better coverage is definitely desired, we made a little progress recently but still a long way to go

... all of these systems are pluggable. Please make sure it's possible to skip the vendor dependencies.

I'm glad you mentioned this, it's in line with the core team's input so far, wasn't something I thought a lot about, but it will be

Could we use something like sequelize?

Definitely. I was debating b/t bookshelf & sequelize, did your team have any particular reason to choose sequelize? BTW I do plan to take advantage of Postgres' JSON data type

CoffeeScript

Definitely no, everyone convinced me (and I'm unsure of CoffeeScript's future as well...)

krotscheck commented 9 years ago

Does anyone have objections to some work being done on these items? I have a direct need for downstream mirroring and caching, and a few free weeks I can throw at it.

zenorocha commented 9 years ago

Any news on this @rayshan? I'd love to consume an api.bower.io to fetch packages by a certain keyword, instead of relying on search-server which is usually flaky.

ghost commented 8 years ago

Just a small crazy hint in JS vs Coffee vs Ruby - why not Pharo (ping @pharo-project)? ;-)

sheerun commented 8 years ago

Unfortunately we need to abandon api rewrite as we lack sufficient resources.

Moreover nowadays it's clear it's not a good idea to introduce yet another binaries registry when npm's is more than sufficient. I suggest to focus on developing bower a client of npm's registry, instead introducing brand new one. This would also fight bower-npm registry dichotomy we experience.