Bioconductor / legacy.support.bioconductor.org

LEGACY!!! Bioconductor's fork of the BiosStar Q&A site. This repo was used prior to Oct 2020. See support.bioconductor.org for ACTIVE site maintenance!
http://www.biostars.org/
Other
3 stars 4 forks source link

search page ranking #4

Open mikelove opened 10 years ago

mikelove commented 10 years ago

I can't tell what the ranking here is based on

https://support.bioconductor.org/local/search/page/?q=

it doesn't appear to be reverse chronological, though that would be the best default ranking IMO

dtenenba commented 10 years ago

For me that url does not show any posts.

dtenenba commented 10 years ago

However, if you go to

https://support.bioconductor.org/t/Latest/

You'll see that the default sort is "new answers". Open that dropdown for other possibilities. Does that help?

mikelove commented 10 years ago

oh sorry, i gave a bad example. I meant like this one:

https://support.bioconductor.org/local/search/page/?q=DESeq2

hpages commented 10 years ago

Related to the search, it doesn't seem to be looking at the titles of the threads e.g.

https://support.bioconductor.org/local/search/page/?q=Bioconductor+2.14+is+released

doesn't find the post with subject "Bioconductor 2.14 is released".

dtenenba commented 10 years ago

@ialbert do you know what's going on here? We're using the haystack search if that matters. Thanks.

ialbert commented 10 years ago

words are interpreted individually and not as a phrase,

haystack and the search engine that you connect to behind a scenes are quite complex beasts that I am sure could be configured in every which way possible, but we did not have the manpower yet to do it, so the search is by relevance which is again that depends on the engine

hpages commented 10 years ago

Yes I guess it's pretty clear that words are interpreted individually, given how the individual words get highlighted in the bodies of the results. My concern is that nothing gets highlighted in the titles of the posts, suggesting that the titles are not being searched. It would make sense that the titles are searched before the bodies.

hpages commented 10 years ago

Also searching for "3.1.1"

https://support.bioconductor.org/local/search/page/?q=3.1.1

or for "v3.1.1"

https://support.bioconductor.org/local/search/page/?q=v3.1.1

doesn't find the "R package not available for R v3.1.1" thread:

https://support.bioconductor.org/p/61052/

ialbert commented 10 years ago

First always look at the right sidebar, if you see the words: "Nothing matches yet" in the Similar Posts tab then it almost always means that this post has not been indexed. These posts will not show up in search results.

The default celery task will index new posts every 15 minutes so new posts may not show up right away. But this seems to be an older post maybe the index should be refreshed, this can be done from command line:

./biostar.sh index

this will recreate the search index.

Now there is also a more general answer as to how the search works.

Biostar does not actually perform the search, it passes down the query into a third party engine that runs behind the scenes. There is support for many different search engines.

But then usually the way the search is then implemented and performed by the engine can be customized it so many ways that it is a task on its own. For example here is elastic search:

http://www.elasticsearch.org/

I never really had time to dwell into all the details of word stemming, capitalization, punctuation etc. so I right now we just take their default behavior and run with that. Customizing the search can be done independently of Biostar, it all depends on the schema and custom parameters to the engine.

On Wed, Sep 17, 2014 at 3:22 PM, hpages notifications@github.com wrote:

Also searching for "3.1.1"

https://support.bioconductor.org/local/search/page/?q=3.1.1

or for "v3.1.1"

https://support.bioconductor.org/local/search/page/?q=v3.1.1

doesn't find the "R package not available for R v3.1.1" thread:

https://support.bioconductor.org/p/61052/

— Reply to this email directly or view it on GitHub https://github.com/Bioconductor/support.bioconductor.org/issues/4#issuecomment-55945988 .

ialbert commented 10 years ago

also I do agree that good search is essential and should be a priority, we'll make it one

ialbert commented 10 years ago

another detail I forgot to mention. Titles are searched for posts that have titles (top level posts) but the title is not treated in any special way. Basically when there is a title it is treated as if it were the first line of the post.

Now it won't highlight the title in the link since that comes from a different source.

For example see:

https://support.bioconductor.org/local/search/page/?q=diffbind

the first hits shows situations where the the title is actually searched and shown as being the first line of the post.

hpages commented 10 years ago

I see. Thanks for explaining. All that seems a little bit weird and counter-intuitive to me though. I wonder if there is any technical reason why the result of a search couldn't just be displayed like the list of posts I get when I click on a user name. Like here:

https://support.bioconductor.org/u/2360/

After all that list is also the result of a search ("search all the posts from that user"). Having the nb of votes/answers/views, plus the bottom line with tags and stuff like "written 10 hours ago by Janet Young • 680 • updated 10 hours ago by Martin Morgan ♦♦ 14k" is really great. Having the search terms highlighted in the title and body plus the ability to sort by relevance or reverse chronological would be really neat. Thanks!

ialbert commented 10 years ago

Search is a relatively new component. We used to rely on Google Domain search but that will start inserting ads on more popular sites. The new search engine is a feature that went live with Biostar 2.0 about six months ago. So it is still in its early stage of understanding how to best make use of it.

Having search results formatted the same way as the contribution posts is a good idea that I haven't considered, Perhaps I ended up being too focused on creating a different representation of each post because the engines are indeed very powerful allow for all kinds of weighting, highlighting and parsing schemes. But consistency would be better.

There is no limitation of why it couldn't look the same, some minor complexities perhaps in that how paging and rendering will works since the index is not a drop in replacement for the database table. (actually it could probably be made to emulate more closely the database).

Improving the search will be our next focus area.

On Thu, Sep 18, 2014 at 2:16 AM, hpages notifications@github.com wrote:

I see. Thanks for explaining. All that seems a little bit weird and counter-intuitive to me though. I wonder if there is any technical reason why the result of a search couldn't just be displayed like the list of posts I get when I click on a user name. Like here:

https://support.bioconductor.org/u/2360/

After all that list is also the result of a search ("search all the posts from that user"). Having the nb of votes/answers/views, plus the bottom line with tags and stuff like "written 10 hours ago by Janet Young • 680 • updated 10 hours ago by Martin Morgan ♦♦ 14k" is really great. Having the search terms highlighted in the title and body plus the ability to sort by relevance or reverse chronological would be really neat. Thanks!

— Reply to this email directly or view it on GitHub https://github.com/Bioconductor/support.bioconductor.org/issues/4#issuecomment-55999116 .

hpages commented 10 years ago

I see. Thanks for the extra details and for providing some background. But most importantly, thanks for the Biostar software!

mikelove commented 10 years ago

yes, let me echo: many thanks to Istvan for the open source of the Biostar software, and to Dan and Marc as well. As a developer, this new support site is really useful for communicating with users.