biocaddie / prototype_issues

Used to report and track bioCADDIE prototype issues
3 stars 5 forks source link

PDB Ranking Issue #45

Open saeid-p opened 8 years ago

saeid-p commented 8 years ago

I tried a few queries, and I was not terribly impressed. The PDB is listed as a repository that is indexed by BioCADDIE. When "xylose isomerase" was entered, nothing showed up even though there are more than 100 entries in the PDB for this molecule.

saeid-p commented 8 years ago

Answered by @RuilingLiu

Hi Greg, Thank you so much for your feedbacks!

When xylose isomerase was entered, nothing showed up even though there are more than 100 entries in the PDB for this molecule. When xylose isomerase was entered, there were 125 entries in PDB. Do you mean that these entries were not what you expected? Please refer to this page: http://datamed.biocaddie.org/search.php?query=xylose%20isomerase&searchtype=data&offset=1&repository=0002

ianfore commented 8 years ago

If you include the quotes in the text you put in the search box then you get no results. Is that what Greg did? I've asked him. If you just type xylose isomerase you get 208 results.

ianfore commented 8 years ago

Email sent to

Hi Greg,

The site does now seem to be up. http://datamed.biocaddie.org

When I run the query with "xylose isomerase” including the double quotes it returns nothing. Does that match your experience?

If I leave the quotes out I get 208 results. Whether or not that gives you useful results only you can answer, but I would be very interested to know.

This particular issue is now at this link on GitHub. https://github.com/biocaddie/prototype_issues/issues/45 We’d like to continue dialog on issues there if that works for you.

Thanks – I’ll keep in touch on this and other issues. Ian

naturalbeau commented 8 years ago

Now the search engine can handle the query rounded by double quotes as a phrase. When search "skin cancer" including double quotes, it returns 9094 results. When search "skin cancer" without double quotes , the search engine will treat it as two words and it returns 14906 results.

jgrethe commented 8 years ago

Confirmed that this is working correctly now: http://datamed.biocaddie.org/search.php?query=%22Xylose%20Isomerase%22&searchtype=data&offset=1&repository=0002

Returns 189 PDB results

ianfore commented 8 years ago

Still not clear that we have consensus of how this should work. "Correctly" is highly subjective. Is there something that "most" users would prefer.

jgrethe commented 8 years ago

Hi Ian, The "correct" comment I made is that the search with text in " " is now being handled and returns expected results. However, I agree that other discussion around search strategies - #1 - need to continue.

tjohnson250 commented 8 years ago

The normal expectation, in my experience, is that anything in quotes must match exactly to trigger a hit, so "serotonin gpcr" AND Cancer would look for datasets containing the exact string plus any term from the query expansion of Cancer. This is how Google works. PubMed, however, seems to ignore the quotes. It expands the above query with (or without) quotes into:

("serotonin"[MeSH Terms] OR "serotonin"[All Fields]) AND gpcr[All Fields]

That is not what I would expect. Perhaps they have data showing that this is better for users?