Open hepcat72 opened 2 months ago
Hey @mneinast, I would like to get your take on this. The issue description hasn't been edited yet to be very clear, but this is based on the discussion Lance and I had in the comment linked in the issue description...
We're wondering about the ramifications of how we handle the following situation: Since we allow/encourage users to submit study data before the study data has been fully compiled, it is technically^ possible that data could be loaded before we know about any multiple representations. On a subsequent load, multiple representations are detected and the researcher is prompted to select the best representation using that conflicts sheet. Thus it is possible that a previously loaded PeakGroup
would have to be deleted (if the user selects the peak group from the new file) so that the selected representation can be loaded.
This is currently dealt with in PR #1225 in the following manner:
When a multiple representation exists, and the "not selected" PeakGroup
already exists in the database:
PeakGroup
is deleted and its replacement is loadedReplacingPeakGroupRepresentation
warning is presented to the user that informs them that the previously loaded PeakGroup
will be removed. They can choose to edit their selection in the conflicts sheet, if they don't want to delete the existing peak group.Lance has expressed some concern over this overall mechanism, and I agree that this should be thought through a bit more, so I would like to explore the following:
Lance proposed an alternate mechanism of alerting the user to the chance of deleting existing peak groups and I kind of like the idea, and that is to include in some way the fact that a peak group pre-exists in the conflicts sheet. We had different ideas on how to do that, but I'd be interested in hearing what idea you might come up with without hearing the strategies we discussed.
^ How likely is it that a peak annotation file could contain data on the same samples that were previously loaded? A user could of course sit on existing data (from the same MSRun) and load a complementary positive/negative scan at any time.
Perhaps your concern is about the timing - i.e., you're suggesting that at the time of the selection of the file from which a peak group is derived, it should be noted in some way that a representation of the PeakGroup already exists in the database?
I like the idea of providing that context during the selection. We have access to that data. We could add a column that says that a PeakGroup already exists from one of the files the user is prompted to select.
The one question I have is, why does that matter? I'm not saying it doesn't. I'm just saying, how could that information affect the user's selection and why might it?
_Originally posted by @hepcat72 in https://github.com/Princeton-LSI-ResearchComputing/tracebase/pull/1222#discussion_r1777584331_