Open hepcat72 opened 1 month ago
this is definitely complicated but I think I follow, and the hack to get it working is smart.
How much can this be avoided if we create the dedicated mzXMLs Advanced Search? Like maybe we tolerate the hack on the assumption that most users will use the dedicated mzXMLs Advanced Search to access them?
this is definitely complicated but I think I follow, and the hack to get it working is smart.
How much can this be avoided if we create the dedicated mzXMLs Advanced Search? Like maybe we tolerate the hack on the assumption that most users will use the dedicated mzXMLs Advanced Search to access them?
Let me clarify... that this issue is not here to solve the mzXML
search/sort problem. I just use that as an example of why I should make the advanced search support "annotations", e.g. the ability to query the number of related records (and to be able to sort without changing the number of rows). So this is a low-priority issue about underlying generic functionality that applies to all search formats and fields. It would also allow us to include columns that are annotations as a base feature, instead of that being just an aspect of the template that renders the tables.
That aside, I think that the hack is good because we don't have any other instances where we would want (for example) to see the placeholder records.
And incidentally, the "mzXMLs" "format"/search would not be affected by this issue.
FEATURE REQUEST
Inspiration
The changes I made to gather all the related mzXML files together (when dealing with placeholder records) makes the advanced search behave in unexpected ways. I was concerned about this... AND FIXED IT, but it took all day and it is somewhat of a hack. Let me describe the problem (which is the result of a lack of support for many-to-many searches).
MZ Data Filename
has a value
", I get32637
results.96285
results, weirdly.MZ Data Filename
does not have a value
", I get102165
results (and they all appear to have files - because the placeholder it's finding doesn't have a file, but is not displayed)104707
results.198450
resultsThis all stems from the fact that to get the neighboring
MSRunSample
records of a placeholder, I am traversing the same table twice... And when I display theMSRunSample.ms_data_file.filename
, I just skip the null one from the placeholder.My testing on my sandbox DB is more discrete, and is easier to understand... It has 408 peak groups and 2 mzXMLs per group and no placeholders. Sorting doubles the rows to 816, so I'm pretty sure that I understand what's going on. Django requires sorting to be preceded by the fields involved being made "distinct", so however many files are involved, there will be that many rows, and since the relationship is 1:M, you get "more rows". I can "solve" that by just removing the ability to sort the results by that column, as sorting by other fields doesn't cause this.
So the problem is then just searching for "
MZ Data Filename
does not have a value
". You get a result row for every peak group that links to a placeholderMSRunSample
record, and since we have placeholders linked to the same sample as ones with mzxml files, you get all peakgroups with no mzxml plus all with a placeholder next to ones with mzxmls.So essentially "
does not have a value
" doesn't work right for the mz data files. A work-around is to search for "MZ Data Filename
does not end with
.mzXML
". Then you get72070
records (which is correct:72070
+32637
=104707
).The problem is that when there are multiple mzXML files, there will be a placeholder, so searching for peakgroups linked to an MSRunSample record that has no mzXML, you will always get those that have a placeholder...
I "fixed" this by editing the
DataRepo.formats.dataformat_group_query.constructAdvancedQueryHelper
method:So that now:
MZ Data Filename
has a value
", I get32637
results.MZ Data Filename
does not have a value
", I get72070
results (and they all haveNone
as their MZ file value, as desired)104707
results.If I just "showed" the
None
value in the list ofmzXML
files in each row, the original code would actually make sense. But what we really want here areMSRunSample
records that only have a placeholder and no neighbors, but there's no way to create that query.Description
What we need here is a way to query for a number of M:M related records using annotations. For fields that are M:M related, as is the
MSRunSample.ms_data_file.filename
, we can accomplish this by looking for "MZ Data Filename
does not have a value
and
number of MSRunSamples
=
1
". Or, even better, would be something like "number of MSRunSamples
whose MZ Data Filename
has a value
=
0
".Neither of these suggestions seems ideal in terms of design, but I think it conveys the idea. This needs an elegant understandable design.
Alternatives
None
Dependencies
None
Comment
None
ISSUE OWNER SECTION
Assumptions
Limitations
Affected Components
Requirements
DESIGN
Interface Change description
None provided
Code Change Description
None provided
Tests