The DISCVRseq tool will index a field called variableSamples. This should be considered an array of the sample names that are variable at that position (though I dont know how lucene technically treats it). This is a really important user-facing search type. For these examples, let's assume we have these two rows:
On the client, the use needs to search "find sites where sample1 has a variant". This should return row 1, but not row 2. We cannot do a naive string match, since the string from row 2 has "sample10" and this contains "sample1". We need integration testing on this behavior.
Similar to above, we need an operator for "not variable", and this needs to respect the same kind of contains behavior.
Need a way to supply a list of sample names. An operator should be "variable in all of"
Need a way to supply a list of sample names. An operator should be "variable in any of"
Need a way to supply a list of sample names. An operator should be "not variable in any of"
Need a way to supply a list of sample names. An operator should be "not variable in one of" (not sure if we really need this)
All of those need integration test cases
The client-side code needs a special field type for sample.
That field type has these unique operators
needs to construct the right kind of lucene string
needs validation over the user input into the field
for the time being, the sample field can be free-text entry. Eventually the server could supply/validate sample names but let's punt for now.
The DISCVRseq tool will index a field called variableSamples. This should be considered an array of the sample names that are variable at that position (though I dont know how lucene technically treats it). This is a really important user-facing search type. For these examples, let's assume we have these two rows:
Row1: variableSamples=sample1,sample2,sample30 Row2: variableSamples=sample2,sample3,sample10
Example queries are:
The client-side code needs a special field type for sample.
Reach goal: