anvaka / npmrank

npm dependencies graph metrics
https://anvaka.github.io/npmrank/online/
MIT License
288 stars 20 forks source link

Ignore "dead" packages #9

Open dylang opened 9 years ago

dylang commented 9 years ago

Dead == packages that haven't been released in some period of time, lets say one year.

This will reduce noise from packages that are likely no longer maintained and help emphasis newer choices over what might have been popular at some point in the past.

For example the general movement from grunt to gulp, colors to chalk, underscore to lodash, etc.

kgryte commented 9 years ago

Simply because a package has not been released in some time, this does not mean the package is "dead". A well designed package may be constrained enough to not require changes, or be well-tested enough to be pretty much bug free; e.g., consider Linux utilities like grep. "Freshness" is a poor indicator of "quality".

By culling "dead" packages, you are biasing results toward whatever is "new", even when "new" might be a copy and paste of what was "old".

Re: the provided examples. Simply because a package may no longer be "trending", this does not mean the package should be removed from results, or even that the package should be penalized in search results.

Re: ranking. What seems to be suggested is that the results do not rank according to personal preference. If so, just create a separate index, with appropriate weighting factors and openly state that the index biases toward what is "new"/"trending".

Regardless, I question the motivation for this issue, as code quality is not captured by activity, downloads, issues, pull-requests, forks, or stars. I have read a lot of "popular" code which is poor and used more because people tend to use unreliable heuristics for code evaluation rather than actually read the code.

dylang commented 9 years ago

Great points @kgryte, I appreciate your feedback and agree with your points.

Maybe I should have explained my motivation instead of proposing a solution.

I was thinking this would make the ranking more relevant to users looking for the right module to use today, rather than just be ranking from the beginning of time.

kgryte commented 9 years ago

@dylang I sense that an index tailored toward finding the right module is a pipe dream. Social heuristics are unreliable indicators for project-dependency fit, particularly as these heuristics are subject to herd effects which are often due more to randomness of the network than any objective criterion.

Note: this critique also applies to the current implementation of npmrank. Inferring qualitative metrics from any pagerank-esque solution, which is often affected by vanity metrics, is prone to significant error. At best, only non-normative statements may be made: package X is used as a dependency by n packages. Those statements do not imply normative statements, such as that, if n_X > n_Y, then X is the preferred package.

Ultimately, the way to discover the right module is by reading source code...a lot of source code.

As an aside, a utopian search index would take little account of vanity heuristics and, instead, take into account actual usage. For instance, package X is used in m live production environments with a lifetime of tau, has been the root cause of n errors, which has the following rate over time, has performance profile P measured against different platforms, etc.

dylang commented 9 years ago

I appreciate your explanation of why an ideal solution is not feasible, and I agree. I'm not expecting a perfect solution to the problem.

I just want to help my co-workers (and any other developers) who are overwhelmed by the choices.

For example, what should developers use for logging? npmjs.com comes back with 4000 matches. The first match is something I created in 2011 and haven't touched since 2012. I think using usage numbers could make npm's search more useful.

anvaka commented 9 years ago

https://anvaka.github.io/npmrank/online/#tag=logging - gives much better suggestions than plain npmjs search.

One downside of this online tool is that it only looks at keywords when it does search. So if there is a logging library which doesn't have 'logging' keyword - the tool will miss it.

I see how it could be interesting to explore new packages only. It should clearly mention that results are filtered by date.