@hextraza: I did some experiments with searches. I believe we have some issues with either how data are indexed, or how we construct the queries.
I previously updated DISCVRseq's index tool to write a summary of what it thinks it's indexing. This table is a summary of the actual lucene index behind mGAP's VCF:
That file has a row for each indexed field, and the min/max values for numeric fields, and the unique values for all string fields. As an example, you can see it reported that it indexed Polyphen2_HDIV_pred, with values of "P, B, D, |".
Nonetheless, when we query for instances where this field equals B, or is-non-blank all return zero rows:
@hextraza: I did some experiments with searches. I believe we have some issues with either how data are indexed, or how we construct the queries.
I previously updated DISCVRseq's index tool to write a summary of what it thinks it's indexing. This table is a summary of the actual lucene index behind mGAP's VCF:
https://prime-seq.ohsu.edu/_webdav/Internal/ColonyData/284/%40files/sequenceOutputPipeline/SequenceOutput_2023-11-18_09-44-43/lucene.stats.txt
That file has a row for each indexed field, and the min/max values for numeric fields, and the unique values for all string fields. As an example, you can see it reported that it indexed Polyphen2_HDIV_pred, with values of "P, B, D, |".
Nonetheless, when we query for instances where this field equals B, or is-non-blank all return zero rows:
https://mgap.ohsu.edu/jbrowse/mGAP/variantSearch.view?session=94F02F58-7675-103C-ACA7-0CA4B661AFC1&trackId=94F02F6F-7675-103C-ACA7-0CA4B661AFC1&target=variantSearch&searchString=Polyphen2_HDIV_pred%252Cequals%252CB&page=0&pageSize=50
https://mgap.ohsu.edu/jbrowse/mGAP/variantSearch.view?session=94F02F58-7675-103C-ACA7-0CA4B661AFC1&trackId=94F02F6F-7675-103C-ACA7-0CA4B661AFC1&target=variantSearch&searchString=Polyphen2_HDIV_pred%252Cis%2520not%2520empty%252C&page=0&pageSize=50