More columns in the npm-datatables.json

eirikb / nipster

Search tool for npm

http://nipstr.com/

126 stars 20 forks source link

More columns in the npm-datatables.json #18

Closed xmojmr closed 9 years ago

xmojmr commented 10 years ago

Finding the nipster was a great surprise for me. I used the original code base and the design as base for JavaScript component search tool, now hosted at http://component.xmojmr.cz

The component dataset is smaller (only ~3000 records), but with richer column set. e.g. the tags column enables searching the dataset by a tag-cloud filter. All repositories are hosted at GitHub

I'd like to try my design against the npm-datatables.json dataset but in its present state there is not enough useful information in it

I also did not find sources of the dataset update engine mentioned in #14

Can you publish the sources or extend the dataset with more columns (as used in the component search tool)?

At this moment I'd make use of just one-time dataset, no updates needed, just to have some data for the proof of concept prototype

eirikb commented 10 years ago

That looks great.

I've been thinking to add other package systems to nipster, for example by having tabs on the top (such as NuGet, gulp, bower, aur, etc.). Perhaps this could be part of that?

About publishing the backend I'm not sure if I'm ready for that.
I probably should do just that, but I feel it needs some cleanup, and I would have to hide things like keys. It's a private repo (not on gh).
Here is the NPM-related parts of it, just to show you: https://gist.github.com/eirikb/04bebf18ea8588ebb0ed

xmojmr commented 10 years ago

Thanks for the gist, I guess the NpmClient uses the npm.exe command line API to crawl the repository.

If you'd publish whole code it might be interesting to look at but I would not make use of it as it is as I'm hosting at provider running Linux server farms and Mono is not bundled by default. Right now I can use PHP as server backend language and I'd like to keep it that way (no vendor-lock, free to move to another provider). Basically right now I can easily consume curl accessible web services. Now Windows-specific code, no custom closed source binaries

The dataset created from component.json files that I'm loading to the web UI comes from an application hosted on heroku cloud service run by owner of project https://github.com/component/crawler.js

Your idea to add other package systems is exactly what I'd like to do as well. As the tasks that the search tool should solve (some d3js visualization of package dependencies being one of the challenges) are the same, regardless of what kind of package/language system it is.

It would be nice to have a set of tools able to operate on any one

From my perspective it would be useful if nipster (or its cdn service) would provide API similar to that of https://github.com/component/crawler.js#get-json with dataset formats similar (similar JSON schema) to that of http://component-crawler.herokuapp.com/.json

Right now I don't know what should by my next (smart) move. What would you suggest? How does your nipster road map look like? What is your vision for nipster?

eirikb commented 10 years ago

I have updated the gist for you with a couple of files I were missing. However I'm not sure if you can use the code directly without me specifying any license or at least giving you the ability to fork in some way.

It saddens me to see you have the impression I have choose poorly in my choice of technology. Albeit you misunderstand some aspects, as you point out yourself mono renders the language and environment quite open and free - but my code relies heavily on Azure and so it is locked to that specific host.

If you want you could look at the old code which updated the packages: https://gist.github.com/eirikb/1e6b1066a976c6a9c1c5 . It might be in the nipster repo as well, but as I have deleted master I'm not sure.

About your idea for an API that is probably a good idea, but the point of nipster is that it is static. Making a static API is futile, and what I want to do is to make one datatables-specific json for each pacakge manager. As you can see in the updated gist the json is built specifically for datatables, if I were to replace datatables I would still build it specifically for the replacement. This is also contra-productive in regards for an API.

My 'vision' for nipster is simple - I will add new package managers as stated before, making a mapping of a datatables-specific json-file for each of them. After all, everything on nipster is static - except the building which is not done on any web server.

xmojmr commented 10 years ago

Thanks a lot for both gists. Especially the second one is the code I can start with as-is for my prototype and later I can port it to my server-side language (PHP at this moment).

I'm running the component search tool also as almost static page, but the server also provides the json file and simple API for updating it (https://github.com/xmojmr/component.io/tree/1.0.1/contents/api/v1). The web server runs the update script automatically through a regular cron task

I chose this design instead of GitHub-hosted purely static page because I was experiencing CORS and performance problems and the Jekyll build system available on GitHub-hosted pages also did not meet my needs.

Your choice of Windows Azure against Google App engine against AppScale against the self hosted AMP stack, nginx, .NET Nancy and the myriad of other options is totally up to you and is fine as soon as it fits your needs and your future plans. Windows Azure and .NET server code is not my 1st choice as I'd get into an expensive vendor lock. But that is just my personal opinion although result of about half a year long evaluation of various options

When I have my prototype and its testing ready I'll get back to you and perhaps we might draw some API proposal or design some common structure (package manager independent) of the DataTables array (including hidden columns). Right now it is not my goal and I would not have a direct use for it.

Thanks again for your gists, nice, useful, good work :+1:

eirikb commented 10 years ago

To wrap this issue up and bypass any digression - would you like me to make a builder that build an additional JSON-file which would be more to your liking? I can add other builders quite easily, and extend the nipster database with more columns if needed. Such a more generic file without the DataTable-specific parts could be a first step in the direction of package manager independent layout as you envision. Although this is not how I picture the future of nipster, adding such a file would be an easy task, and if the benefit for you is high I see no reason not to make it.

xmojmr commented 10 years ago

@eirikb one conclusion from discussion about GitHub API Rate Limit is that building & running the data pumping part myself (on a LAMP stack server) is possible but it is the least attractive option.

If it is quite easy to add another builder to nipster it would be nice if nipster cdn contained also the data file that my search tool prototype would use.

1 the class diagram of what the web server part of my search tool needs is here: https://raw.githubusercontent.com/xmojmr/npmjs.org/0.0.1/docs/web-server-conteptual-classes.png

Exact property names and class names don't matter. I don't plan to use package dependencies in the 1st prototype
2 the dataset downloaded from the nipster cdn would first go into my own cache, get restructured and served to the web browser application, one part of the dataset used by the web browser application is the jQuery DataTable format, but other things - like keywords and there weights, authors and their number of packages and number of open issues etc. are also needed. This would be handled by both web server code and web browser code, raw dataset coming from the nipster cdn (in the diagram it is called cloud data services) does not need to care about it at all

Basic scenario is shown here: https://raw.githubusercontent.com/xmojmr/npmjs.org/0.0.1/docs/simplified-data-flow.png
3 in order to improve performance, maybe I'll have to use DataTables server-side processing

So if it is quite easy to extend the nipster builder and utilize your existing infrastracture - can you add an output format that would contain data structures described in the class diagram?

xmojmr commented 9 years ago

I'm withdrawing the request due to apparent stalemate