Closed holyjak closed 2 years ago
Not a big deal but one thing I noticed is that searching for metosin json
doesn't return metosin/jsonista
.
I'm not sure if search is set up for non-Clojars sources, but I couldn't find tools.reader using either tools.reader
or org.clojure/tools.reader
.
Thanks a lot, @KingMob ! It will be fixed in a few minutes. We fetched only the first 10 instead of all 64 artifacts. (Fixed by #314.)
@martinklepsch for the record, the metosin json
no results problem has been fixed by #310.
I'm searching for jackdaw
, but the only result that pops up is some version built off a branch https://cljdoc.org/d/fundingcircle/jackdaw/0.6.7-AlexVPopov_patch_1-SNAPSHOT, instead of the latest release: https://cljdoc.org/d/fundingcircle/jackdaw/0.6.6/doc/readme
I guess the problem is that 0.6.7* > 0.6.6 and the code doesn't really look at the versions to filter out "weird ones". I will look into it, eventually.
Firstly, cljdoc is a really cool tool that I am fond of!
I think it should be a goal that when searching for e.g. lacinia
, the "main library" (aka com.walmartlabs.lacinia 0.34.0
) shows up first, but what I am seeing is:
com.walmartlabs.lacinia 0.34.0
Do you agree that this should be considered a bug?
Hi @escherize, that would indeed be ideal. But the code can hardly guess what is the "main library". What we want to do is take into account the download count, that should push the more popular artifacts up. I have a branch where I started working on this but struggle to configure it so that it actually improves the results. It is not trivial :-(
A search for "turtle" "clj-client" "com.turtlequeue" does not return https://cljdoc.org/d/com.turtlequeue/clj-client/0.0.7 (I have not yet found a way to see it)
@nha I think the problem is that only specific group IDs (namely org.clojure
) on Maven Central are whitelisted to be included in the search. Reasoning being that we don't want to actually search all of Maven Central :) Maybe we can add the turtlequeue stuff...
@martinklepsch right that seems like a quick fix, something like this:
http://search.maven.org/solrsearch/select?q=g:%22org.clojure%22+OR+g:%22com.turtlequeue%22&rows=200
I can submit a PR it you agree with the above
As soon as we get to 3-4 different groups in Maven Central, we should move them from a hardcoded string to a config file / DB table. For now it seems there will not be many and little churn so hardcoding is OK.
@nha I'm ok with hardcoding for now 👍 Maybe use a function to URI encode instead of just manually doing it, that would at least improve readability.
Hi - I am still unable to find org.clojure/tools.reader
or the ns clojure.tools.reader.edn
via the search box. See slack question: https://clojurians.slack.com/archives/C8V0BQ0M6/p1577855041031500
As a follow up to that tools.reader
does find it but it's the last option in the list (and what look like worse matches are higher in the list).
I know, I am sorry. That should be fixed by #359 but boosting search results so that you get the desired results is sadly quite complicated. I will get back to the PR in the coming weeks and try to finish it.
com.fulcrologic
does not find anything though a search for fulcro
shows com.fulcrologic/fulcro
in the list.
@tobias has done a great job of improving Clojars search including boosting results by download counts. We should copy his work - see https://github.com/clojars/clojars-web/issues/719#issuecomment-1019525194 for details
FWIW, the regular releases are prioritized over SNAPSHOTS now. See #551.
@holyjak I can take a crack at bringing over clojars work.
After taking a peek, it seems that clojars ranks by downloads over all time? Cljdoc's current tracked clojars download stats (which seem to be inactive by the way) seem to be for the last n days (currently configured to 380). I'm feeling the last year (or so) of downloads might be a more relevant metric than over all time? Perhaps some library was once popular, but has been superseded by another... for example honeysql v2 or next.jdbc for examples.
Some libs are hosted on maven instead of clojars, notably org.clojure
.
There are no publicly available download stats for maven central.
Perhaps well weigh org.clojure
libs very heavily?
Not implemented yet (#459), so we won't worry about these for now.
So looking a bit deeper into this. Clojars supports lucene search syntax.
Cljdoc currently tries to find the best match without any lucene syntax. This can be a bit tricky/opaque.
I'm thinking going the clojars route makes more sense.
Like clojars, we'd search all fields by default, but if you aren't finding what you are looking for you can get specific.
We'd limit ourselves to specific fields: group-id
artifact-id
and pom description
(clojars also offers url
at
and licenses
).
One difference between clojars and cljdoc search is that cljdoc presents results as you type. I think the auto-suggest approach works well for cljdoc. I'll experiment with the effect of lucene syntax on auto-suggest.
On the topic of description
I find it a bit confusing to free-text search on somethign that is not presented to the user. I might experiment with showing the description in suggest results.
As always, am happy to hear feedback/questions/concerns.
I'm thinking going the clojars route makes more sense.
Well Lee, maybe not (hey no one else is responding, so why not? 🙂)
To present results as you type, we need partial match support. So... maybe we won't entirely be taking the clojars search route.
I'll shall ponder and play.
Thanks a lot for taking over! I had wanted to take up the work again for a long time but never felt I had enough time to really dive in.
I agree with adding extra weight to :org.clojars
results, that's what I'd do. It is not perfect because some contrib libs are not actively used anymore - e.g. the jdbc one is replaced by next.jdbc - but it is better than nothing :)
If we also search description then I would add it less weight than matches on group / artifact id.
Yes, partial matches complicate stuff...
Thanks @holyjak, I'll do my best!
I think I went a bit overboard with the idea of supporting lucene syntax like clojars does and have since abandoned that notion.
I think this general issue was great for collecting initial feedback. Are you ok with now closing it in favor of creating focused issues like #568?
Improve results produced by the new search introduced by #85.
Report problems and bad/suboptimal results here. (Check the Known Problems below first, please!)
Known Problems