Open dominicbarnes opened 10 years ago
Setting up a webhook using a publish command sounds awesome. Zero friction.
Github could rate-limit the server if it’s using the API on every push.
One thing Component got right, even accidentally, was that the wiki was really easy to search through. Searching via keywords doesn’t really work that well and is the main reason npm sucks for finding things. Using the keywords to group by category would be the best way to do it. This allows people to narrow it down themselves and then search from there. Basically how the app store works, you need people to discover that way rather than searching, because they don’t know what to search for.
Might be worth having a fixed set of keywords that group them on the main page, and then all other packages that don’t fall into that can just be searched for or there could just be a giant list.
Basically, discoverability is really important :)
@anthonyshort I totally agree, I never used component.io, just pulled up the wiki and searched w/ my browser. The categories were indispensable for me too, which is why I want to put emphasis on them.
@dominicbarnes thanks for putting this together! I definitely think this approach makes sense.
The only thing I'd like to pull in is the discussion @ianstormtaylor and i had here: https://github.com/segmentio/khaos/issues/41
I think if we go down this route, the registry should be namespaced to allow future projects to use it. In fact, I don't think this should even be a Duo-specific registry.
One thing Component got right, even accidentally, was that the wiki was really easy to search through.
Agreed, that's definitely what helped get the project off the ground and if you remember Node did that same thing back in the day, when there were only a handful of modules.
I figure that each webhook call would trigger a "scrape" of the repository. (depending on what information comes in the payload of course
@dominicbarnes would love to get the description and perhaps readme and code scraped. Github search is nice cause it also combs through code / comments. That'd definitely increase the scope of the project.
I'm fine with whatever on the application architecture. We could even go simpler than that use https://github.com/bigeasy/locket and maybe a pure JS search/indexer. Definitely wouldn't scale, but it might just be enough for now, as long as the registry only handles search.
How would we transition existing repos over to this new registry?
@MatthewMueller I think we can write a script that can take care of everything in component/*
. (eg: iterate repos, clone, check for component.json, duo publish)
I think the amount of work required of devs is pretty minimal, it's arguably even simpler than editing a wiki, so I think we make sure to broadcast that out and let the devs take care of it themselves. Perhaps we can use a bot to traverse everything in the component wiki and open an issue asking them to consider adding to the new duo registry. (which they can close if they choose not to)
Yeah we can even just index everything in the registry automatically for them? Cuz I think we won't need any permissions or anything?
On Mon, Aug 18, 2014 at 12:05 PM, Dominic Barnes notifications@github.com wrote:
How would we transition existing repos over to this new registry?
@MatthewMueller https://github.com/MatthewMueller I think we can write a script that can take care of everything in component/*. (eg: iterate repos, clone, check for component.json, duo publish)
I think the amount of work required of devs is pretty minimal, it's arguably even simpler than editing a wiki, so I think we make sure to broadcast that out and let the devs take care of it themselves. Perhaps we can use a bot to traverse everything in the component wiki and open an issue asking them to consider adding to the new duo registry. (which they can close if they choose not to)
— Reply to this email directly or view it on GitHub https://github.com/component/duo/issues/197#issuecomment-52539282.
@ianstormtaylor I think we can do a 1-time pass for indexing things. However, those will involve API calls, so we may need to batch them or something. (since we'll have a single server that's making the API calls) This also means we probably won't be automatically updating them, which is where we need repo authors to add webhooks.
Ah truth forgot about that requirement, sounds good to me
On Monday, August 18, 2014, Dominic Barnes notifications@github.com wrote:
@ianstormtaylor https://github.com/ianstormtaylor I think we can do a 1-time pass for indexing things. However, those will involve API calls, so we may need to batch them or something. (since we'll have a single server that's making the API calls) This also means we probably won't be automatically updating them, which is where we need repo authors to add webhooks.
— Reply to this email directly or view it on GitHub https://github.com/component/duo/issues/197#issuecomment-52563294.
Well, in actuality any scrape we do is going to involve API calls (to retrieve the component.json, package.json, etc) since a "push" webhook won't have all of those details in the payload. Perhaps the "scrape" on every "push" isn't a solution that will scale.
Maybe we would need to scrape more sparingly. For example, on the initial publish as well as on "create" (webhook for tag/branch creation) instead of on "push". Depending on what commit information is shown in a "push" event, maybe we can inspect for changes to the manifest and conditionally scrape if we think something has changed there.
Copying my thoughts from dupe I created here:
While I love no central place for aggregating packages and the fact that everyone just hosts their own, that definitely makes discovery a lot more painful than say npm. That being said I think this problem should be easy to solve and I hope duo will do it to be more appealing for everyone. I propose to add three commands:
First one should just look at the git config of the CWD and register github repo somewhere is the duo registry server. Second command should unregister.
Third command should query duo registry to discover matching packages.
Now as far as discussion gos here I do not think that there is a need for tracking pushes to a github, all registry needs to have is a pointer to github repo all the versions lookup etc.. can be done live using github API. While it maybe little challenging to do well in the CLI it's should be just fine in the webapp version as additional details of the search results can be oppressively added as results are fetched from github API. This also reduces whole class of out of sync issues that may occur if I delete tags or force push etc etc..
I think it would be also nice to generate description from the Readme files in the repo rather than having to specify one during registration.
duo find
duo search
!!
Since it will supports all bower packages. Why not base it off bower, making it easier for developers to adopt? https://github.com/bower/registry
$ bower
Usage:
bower <command> [<args>] [<options>]
Commands:
cache Manage bower cache
help Display help information about Bower
home Opens a package homepage into your favorite browser
info Info of a particular package
init Interactively create a bower.json file
install Install a package locally
link Symlink a package folder
list List local packages
lookup Look up a package URL by name
prune Removes local extraneous packages
register Register a package
search Search for a package by name
update Update a local package
uninstall Remove a local package
version Bump a package version
Options:
-f, --force Makes various commands more forceful
-j, --json Output consumable JSON
-l, --log-level What level of logs to report
-o, --offline Do not hit the network
-q, --quiet Only output important information
-s, --silent Do not output anything, besides errors
-V, --verbose Makes output more verbose
--allow-root Allows running commands as root
--version Output Bower version
See 'bower help <command>' for more information on a specific command.
:+1: for duo search
I think @dominicbarnes's idea for having it hooked into GitHub pushes is key for keeping the registry up to date easily though. If we're going to want to provide more than just a URL in the search results we'll want descriptions and readmes and stuff, so it would be nice if they just stayed up to date automatically? Would be curious to know what Bower does for that or what their search
even looks like.
+1 to:
duo register
duo unregister
duo search
I think @dominicbarnes's idea for having it hooked into GitHub pushes is key for keeping the registry up to date easily though. If we're going to want to provide more than just a URL in the search results we'll want descriptions and readmes and stuff, so it would be nice if they just stayed up to date automatically? Would be curious to know what Bower does for that or what their search even looks like.
Well the fact that you'll have to sync opens opportunity to get out of sync. What I am suggesting to just query github API live to get most up to date info during search. That way you can't get out of sync :)
@Gozala using GitHub Web Hooks is the way to keep in sync, we're not talking about something like npm publish
. GitHub will trigger updates to the Registry for many different events that developers will take on their repo. Thus, duo publish
will basically open up a stream of updates, whereas duo unpublish
will shut it off.
I know how GitHub web hooks work. All I'm trying to say is that I think that this introduces complexity, will use more space and has an opportunity to get out of sync (server is down or bugs or whatever). Quering github live is free all of these constraints, although in practice it may end up little too slow, but if it isn't I think it's a lot simpler option.
That being said I don't have anything against what's already proposed.
Doing it all live would definitely be ideal, but I don't think it'd be possible to get the most up-to-date information from GitHub in a short enough time for things like $ duo search event
which would return 10–100s of results? Search should be sub-second to be useful I think.
I created a super basic duo-search to save me a little time, but it just uses Github's search API to find JavaScript repos matching a keyword: https://github.com/johntron/duo-search. I'll happily transfer ownership.
@dominicbarnes I'd love to help setup the registry - have you already started?
@johntron I never really got anywhere with it, haven't taken the time. I would love to see duo-search
become a JS-api that can be consumed by multiple tools. (eg: a CLI, a web app, etc)
@dominicbarnes done: https://github.com/johntron/duo-search#usage-api
So, I've been thinking about this a lot over the last couple of days. I wanted to start a discussion by sharing my thoughts on this, before I started writing any real code.
I think the best place to start is to have the registry only responsible for searching. I don't think it's necessary to add tarballs or even deal with versioning at all.
With that being said, I think the registry would only need a few bits of data: (the rest lives in the source repository, so just a link should suffice here)
Now, onto the topic of how data is added to the registry. After considering several alternatives, I think the best way to approach this is to use webhooks. We can make this extremely easy for duo users by having
duo-publish
use the Github API to add the webhook for them. (then once the hook is in, no more work will be needed by the developer)I'm thinking the "push" event is probably more than enough, although we can easily add more. We will need some sort of custom server ready to handle webhooks from various services. Github is obvious, but BitBucket support is in the works too, so we should probably have this server pluggable. (or at least easily extendable) It should probably support a "manual" API, allowing people to manually add their repo, without the continuous stream of webhooks.
I figure that each webhook call would trigger a "scrape" of the repository. (depending on what information comes in the payload of course) The
component.json
would be checked first, falling back to apackage.json
, and lastly to the repository meta itself. (all using the Github API I presume)Lastly, I think that "keywords" can be used to group components in the search interface. I'm thinking duo can have a few that are special-cases that likely matches the structure of the wiki now. (eg: "ui-element", "utility", "async", etc) Beyond that level of structure, I think the rest of the fields are just searchable as plaintext.
As far as the technical implementation details, there are lots of possibilities. My first thoughts would be Heroku for the app, CouchDB for persistence and Elasticsearch for the actual searching/indexing. But depending on who would like to collaborate, how we would want to deploy, etc we can always work with other tools.
Anyways... sorry about the huge blob of text, as you can tell I've put a lot of thought into this lol
tl;dr