BimberLab / DiscvrLabKeyModules

A collection of public LabKey modules developed by the Bimber Lab
4 stars 4 forks source link

Possible search problems? #258

Closed bbimber closed 5 months ago

bbimber commented 6 months ago

@hextraza: I did some experiments with searches. I believe we have some issues with either how data are indexed, or how we construct the queries.

I previously updated DISCVRseq's index tool to write a summary of what it thinks it's indexing. This table is a summary of the actual lucene index behind mGAP's VCF:

https://prime-seq.ohsu.edu/_webdav/Internal/ColonyData/284/%40files/sequenceOutputPipeline/SequenceOutput_2023-11-18_09-44-43/lucene.stats.txt

That file has a row for each indexed field, and the min/max values for numeric fields, and the unique values for all string fields. As an example, you can see it reported that it indexed Polyphen2_HDIV_pred, with values of "P, B, D, |".

Nonetheless, when we query for instances where this field equals B, or is-non-blank all return zero rows:

https://mgap.ohsu.edu/jbrowse/mGAP/variantSearch.view?session=94F02F58-7675-103C-ACA7-0CA4B661AFC1&trackId=94F02F6F-7675-103C-ACA7-0CA4B661AFC1&target=variantSearch&searchString=Polyphen2_HDIV_pred%252Cequals%252CB&page=0&pageSize=50

https://mgap.ohsu.edu/jbrowse/mGAP/variantSearch.view?session=94F02F58-7675-103C-ACA7-0CA4B661AFC1&trackId=94F02F6F-7675-103C-ACA7-0CA4B661AFC1&target=variantSearch&searchString=Polyphen2_HDIV_pred%252Cis%2520not%2520empty%252C&page=0&pageSize=50

bbimber commented 5 months ago

This seems to be OK with the new index