Closed lparsons closed 1 month ago
So here are the PeakGroup
s that have multiple representations on dev.
First though, these are the multiple compound representations per sample AND sequence. There are 180. (There are 201 when you exclude sequence from the unique constraint - scroll below for that.) Note, I never "committed" the run of the code from #1167 on dev. I only ever ran it in dry-run mode. Note also that I list the results here by compound, sample, and sequence and filter for only those whose peakgroups number greater than 1:
In [22]: for dct in PeakGroup.objects.values("name", "msrun_sample__sample__name", "msrun_sample__msrun_sequence__id").annotate(pgs_per_sample_seq=Count("name")).filter(pgs_per_sample_seq__gt=1):
...: print(f"{dct['name']}\t{dct['msrun_sample__sample__name']}\tMSRunSequence {dct['msrun_sample__msrun_sequence__id']}\t{dct['pgs_per_sample_seq']}")
...: print(f"Number of peak groups with multiple representations (in a sample / sequence): {PeakGroup.objects.values('name', 'msrun_sample__sample__name', 'msrun_sample__msrun_sequence__id').annotate(pgs_per_sample_seq=Count('name')).filter(pgs_per_sample_seq__gt=1).count()}")
3-Ureidopropionic acid exp048a_01_1080 MSRunSequence 46 2
creatine exp048a_05_0240 MSRunSequence 46 2
thymidine exp048a_03_0240 MSRunSequence 46 2
thymidine exp048a_02_0000 MSRunSequence 46 2
thymidine exp048a_03_0000 MSRunSequence 46 2
cytidine exp048a_04_0240 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_02_0060 MSRunSequence 46 2
cytidine exp048a_05_0240 MSRunSequence 46 2
cytidine exp048a_01_0060 MSRunSequence 46 2
cytidine exp048a_04_0060 MSRunSequence 46 2
cytidine exp048a_03_0000 MSRunSequence 46 2
creatine exp048a_03_0240 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_01_1440 MSRunSequence 46 2
cytidine exp048a_02_0020 MSRunSequence 46 2
cytidine exp048a_03_0060 MSRunSequence 46 2
creatine exp048a_04_0240 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_04_0000 MSRunSequence 46 2
creatine exp048a_05_0020 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_02_0000 MSRunSequence 46 2
cytidine exp048a_02_0240 MSRunSequence 46 2
creatine exp048a_04_0060 MSRunSequence 46 2
creatine exp048a_01_0000 MSRunSequence 46 2
cytidine exp048a_05_0060 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_02_0240 MSRunSequence 46 2
thymidine exp048a_05_0240 MSRunSequence 46 2
cytidine exp048a_02_0060 MSRunSequence 46 2
cytidine exp048a_01_1440 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_05_0060 MSRunSequence 46 2
creatine exp048a_02_0060 MSRunSequence 46 2
thymidine exp048a_01_1440 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_04_0020 MSRunSequence 46 2
thymidine exp048a_01_0060 MSRunSequence 46 2
cytidine exp048a_03_0020 MSRunSequence 46 2
thymidine exp048a_03_0060 MSRunSequence 46 2
creatine exp048a_04_0020 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_04_0240 MSRunSequence 46 2
thymidine exp048a_04_0000 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_05_0000 MSRunSequence 46 2
creatine exp048a_04_0000 MSRunSequence 46 2
thymidine exp048a_01_0000 MSRunSequence 46 2
creatine exp048a_01_1440 MSRunSequence 46 2
thymidine exp048a_02_0240 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_01_0240 MSRunSequence 46 2
creatine exp048a_05_0060 MSRunSequence 46 2
cytidine exp048a_04_0000 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_01_0000 MSRunSequence 46 2
thymidine exp048a_05_0060 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_03_0240 MSRunSequence 46 2
thymidine exp048a_01_0240 MSRunSequence 46 2
thymidine exp048a_04_0060 MSRunSequence 46 2
thymidine exp048a_03_0020 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_04_0060 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_01_0020 MSRunSequence 46 2
creatine exp048a_01_0240 MSRunSequence 46 2
creatine exp048a_03_0020 MSRunSequence 46 2
thymidine exp048a_05_0000 MSRunSequence 46 2
creatine exp048a_01_1080 MSRunSequence 46 2
thymidine exp048a_04_0240 MSRunSequence 46 2
creatine exp048a_02_0000 MSRunSequence 46 2
cytidine exp048a_01_0240 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_03_0020 MSRunSequence 46 2
thymidine exp048a_01_0020 MSRunSequence 46 2
creatine exp048a_03_0000 MSRunSequence 46 2
cytidine exp048a_03_0240 MSRunSequence 46 2
thymidine exp048a_01_1080 MSRunSequence 46 2
cytidine exp048a_04_0020 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_02_0020 MSRunSequence 46 2
thymidine exp048a_02_0020 MSRunSequence 46 2
cytidine exp048a_01_1080 MSRunSequence 46 2
cytidine exp048a_02_0000 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_01_0060 MSRunSequence 46 2
creatine exp048a_05_0000 MSRunSequence 46 2
creatine exp048a_01_0060 MSRunSequence 46 2
creatine exp048a_01_0020 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_03_0060 MSRunSequence 46 2
creatine exp048a_02_0020 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_03_0000 MSRunSequence 46 2
creatine exp048a_02_0240 MSRunSequence 46 2
cytidine exp048a_01_0000 MSRunSequence 46 2
cytidine exp048a_01_0020 MSRunSequence 46 2
creatine exp048a_03_0060 MSRunSequence 46 2
cytidine exp048a_05_0020 MSRunSequence 46 2
cytidine exp048a_05_0000 MSRunSequence 46 2
thymidine exp048a_04_0020 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_05_0240 MSRunSequence 46 2
thymidine exp048a_05_0020 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_05_0020 MSRunSequence 46 2
thymidine exp048a_02_0060 MSRunSequence 46 2
arginine exp027f4_free_M02_brain MSRunSequence 51 2
lysine exp027f4_free_M03_lung MSRunSequence 51 2
lysine exp027f4_free_M03_ceccon MSRunSequence 51 2
lysine exp027f4_free_M03_gWAT MSRunSequence 51 2
lysine exp027f4_free_M03_pancreas MSRunSequence 51 2
lysine exp027f4_free_M02_pancreas MSRunSequence 51 2
arginine exp027f4_free_M02_liver MSRunSequence 51 2
arginine exp027f4_free_M03_iWAT MSRunSequence 51 2
cytidine exp048a_06_0000 MSRunSequence 46 2
carnosine exp048a_06_0020 MSRunSequence 46 2
cytidine exp048a_06_0060 MSRunSequence 46 2
lysine exp027f4_free_M02_kidney MSRunSequence 51 2
arginine exp027f4_free_M03_pancreas MSRunSequence 51 2
carnosine exp048a_07_0060 MSRunSequence 46 2
lysine exp027f4_free_M03_eye MSRunSequence 51 2
lysine exp027f4_free_M03_brain MSRunSequence 51 2
lysine exp027f4_free_M02_BAT MSRunSequence 51 2
cytidine exp048a_07_0000 MSRunSequence 46 2
arginine exp027f4_free_M03_skin MSRunSequence 51 2
carnosine exp048a_06_0000 MSRunSequence 46 2
thymidine exp048a_06_0240 MSRunSequence 46 2
lysine exp027f4_free_M03_jejunum MSRunSequence 51 2
arginine exp027f4_free_M03_BAT MSRunSequence 51 2
lysine exp027f4_free_M03_kidney MSRunSequence 51 2
thymidine exp048a_06_0000 MSRunSequence 46 2
lysine exp027f4_free_M02_quad MSRunSequence 51 2
arginine exp027f4_free_M02_spleen MSRunSequence 51 2
lysine exp027f4_free_M03_quad MSRunSequence 51 2
arginine exp027f4_free_M02_kidney MSRunSequence 51 2
3-Ureidopropionic acid exp048a_06_0000 MSRunSequence 46 2
lysine exp027f4_free_M03_colon MSRunSequence 51 2
carnosine exp048a_07_0240 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_07_0060 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_06_0020 MSRunSequence 46 2
lysine exp027f4_free_M03_stom MSRunSequence 51 2
lysine exp027f4_free_M02_brain MSRunSequence 51 2
arginine exp027f4_free_M03_brain MSRunSequence 51 2
3-Ureidopropionic acid exp048a_07_0020 MSRunSequence 46 2
arginine exp027f4_free_M03_gWAT MSRunSequence 51 2
arginine exp027f4_free_M03_plasma MSRunSequence 51 2
lysine exp027f4_free_M02_spleen MSRunSequence 51 2
arginine exp027f4_free_M03_testis MSRunSequence 51 2
arginine exp027f4_free_M03_jejunum MSRunSequence 51 2
thymidine exp048a_07_0000 MSRunSequence 46 2
cytidine exp048a_07_0020 MSRunSequence 46 2
lysine exp027f4_free_M03_iWAT MSRunSequence 51 2
lysine exp027f4_free_M03_dia MSRunSequence 51 2
thymidine exp048a_07_0240 MSRunSequence 46 2
lysine exp027f4_free_M03_testis MSRunSequence 51 2
arginine exp027f4_free_M03_dia MSRunSequence 51 2
arginine exp027f4_free_M03_lung MSRunSequence 51 2
arginine exp027f4_free_M03_spleen MSRunSequence 51 2
arginine exp027f4_free_M03_quad MSRunSequence 51 2
lysine exp027f4_free_M02_liver MSRunSequence 51 2
arginine exp027f4_free_M03_ceccon MSRunSequence 51 2
arginine exp027f4_free_M02_pancreas MSRunSequence 51 2
arginine exp027f4_free_M03_heart MSRunSequence 51 2
lysine exp027f4_free_M02_heart MSRunSequence 51 2
3-Ureidopropionic acid exp048a_07_0000 MSRunSequence 46 2
arginine exp027f4_free_M02_heart MSRunSequence 51 2
arginine exp027f4_free_M03_liver MSRunSequence 51 2
thymidine exp048a_06_0020 MSRunSequence 46 2
arginine exp027f4_free_M03_colon MSRunSequence 51 2
3-Ureidopropionic acid exp048a_06_0060 MSRunSequence 46 2
lysine exp027f4_free_M03_heart MSRunSequence 51 2
carnosine exp048a_06_0060 MSRunSequence 46 2
arginine exp027f4_free_M02_quad MSRunSequence 51 2
3-Ureidopropionic acid exp048a_07_0240 MSRunSequence 46 2
cytidine exp048a_07_0240 MSRunSequence 46 2
thymidine exp048a_07_0060 MSRunSequence 46 2
arginine exp027f4_free_M02_plasma_20220909142304 MSRunSequence 51 2
cytidine exp048a_07_0060 MSRunSequence 46 2
cytidine exp048a_06_0240 MSRunSequence 46 2
arginine exp027f4_free_M02_colon MSRunSequence 51 2
lysine exp027f4_free_M03_spleen MSRunSequence 51 2
lysine exp027f4_free_M03_BAT MSRunSequence 51 2
lysine exp027f4_free_M03_skin MSRunSequence 51 2
arginine exp027f4_free_M02_BAT MSRunSequence 51 2
thymidine exp048a_07_0020 MSRunSequence 46 2
carnosine exp048a_07_0000 MSRunSequence 46 2
lysine exp027f4_free_M02_plasma_20220909142304 MSRunSequence 51 2
arginine exp027f4_free_M03_stom MSRunSequence 51 2
cytidine exp048a_06_0020 MSRunSequence 46 2
3-Ureidopropionic acid exp048a_06_0240 MSRunSequence 46 2
carnosine exp048a_07_0020 MSRunSequence 46 2
thymidine exp048a_06_0060 MSRunSequence 46 2
lysine exp027f4_free_M02_colon MSRunSequence 51 2
arginine exp027f4_free_M03_eye MSRunSequence 51 2
lysine exp027f4_free_M03_plasma MSRunSequence 51 2
lysine exp027f4_free_M03_liver MSRunSequence 51 2
arginine exp027f4_free_M03_kidney MSRunSequence 51 2
carnosine exp048a_06_0240 MSRunSequence 46 2
Number of peak groups with multiple representations (in a sample / sequence): 180
But if you consider ONLY peak group compound and SAMPLE (i.e. do not include the sequence in the unique constraint), we have 201 instances of multiple representations (listing by compound and sample):
In [18]: for dct in PeakGroup.objects.values("name", "msrun_sample__sample__name").annotate(pgs_per_sample=Count("name")).filter(pgs_per_sample__gt=1):
...: print(f"{dct['name']}\t{dct['msrun_sample__sample__name']}\t{dct['pgs_per_sample']}")
...: print(f"Number of peak groups with multiple representations (in a sample regardless of sequence): {PeakGroup.objects.values('name', 'msrun_sample__sample__name').annotate(pgs_per_sample=Count('name')).filter(pgs_per_sample__gt=1).count()}")
3-hydroxybutyrate col005d_blank2 2
3-Ureidopropionic acid exp048a_01_0000 2
3-Ureidopropionic acid exp048a_01_0020 2
3-Ureidopropionic acid exp048a_01_0060 2
3-Ureidopropionic acid exp048a_01_0240 2
3-Ureidopropionic acid exp048a_01_1080 2
3-Ureidopropionic acid exp048a_01_1440 2
3-Ureidopropionic acid exp048a_02_0000 2
3-Ureidopropionic acid exp048a_02_0020 2
3-Ureidopropionic acid exp048a_02_0060 2
3-Ureidopropionic acid exp048a_02_0240 2
3-Ureidopropionic acid exp048a_03_0000 2
3-Ureidopropionic acid exp048a_03_0020 2
3-Ureidopropionic acid exp048a_03_0060 2
3-Ureidopropionic acid exp048a_03_0240 2
3-Ureidopropionic acid exp048a_04_0000 2
3-Ureidopropionic acid exp048a_04_0020 2
3-Ureidopropionic acid exp048a_04_0060 2
3-Ureidopropionic acid exp048a_04_0240 2
3-Ureidopropionic acid exp048a_05_0000 2
3-Ureidopropionic acid exp048a_05_0020 2
3-Ureidopropionic acid exp048a_05_0060 2
3-Ureidopropionic acid exp048a_05_0240 2
3-Ureidopropionic acid exp048a_06_0000 2
3-Ureidopropionic acid exp048a_06_0020 2
3-Ureidopropionic acid exp048a_06_0060 2
3-Ureidopropionic acid exp048a_06_0240 2
3-Ureidopropionic acid exp048a_07_0000 2
3-Ureidopropionic acid exp048a_07_0020 2
3-Ureidopropionic acid exp048a_07_0060 2
3-Ureidopropionic acid exp048a_07_0240 2
arginine exp027f4_free_M02_BAT 2
arginine exp027f4_free_M02_brain 2
arginine exp027f4_free_M02_colon 2
arginine exp027f4_free_M02_heart 2
arginine exp027f4_free_M02_kidney 2
arginine exp027f4_free_M02_liver 2
arginine exp027f4_free_M02_pancreas 2
arginine exp027f4_free_M02_plasma_20220909142304 2
arginine exp027f4_free_M02_quad 2
arginine exp027f4_free_M02_spleen 2
arginine exp027f4_free_M03_BAT 2
arginine exp027f4_free_M03_brain 2
arginine exp027f4_free_M03_ceccon 2
arginine exp027f4_free_M03_colon 2
arginine exp027f4_free_M03_dia 2
arginine exp027f4_free_M03_eye 2
arginine exp027f4_free_M03_gWAT 2
arginine exp027f4_free_M03_heart 2
arginine exp027f4_free_M03_iWAT 2
arginine exp027f4_free_M03_jejunum 2
arginine exp027f4_free_M03_kidney 2
arginine exp027f4_free_M03_liver 2
arginine exp027f4_free_M03_lung 2
arginine exp027f4_free_M03_pancreas 2
arginine exp027f4_free_M03_plasma 2
arginine exp027f4_free_M03_quad 2
arginine exp027f4_free_M03_skin 2
arginine exp027f4_free_M03_spleen 2
arginine exp027f4_free_M03_stom 2
arginine exp027f4_free_M03_testis 2
C18:1 col005d_blank2 2
C18:2 col005d_blank2 2
carnosine exp048a_06_0000 2
carnosine exp048a_06_0020 2
carnosine exp048a_06_0060 2
carnosine exp048a_06_0240 2
carnosine exp048a_07_0000 2
carnosine exp048a_07_0020 2
carnosine exp048a_07_0060 2
carnosine exp048a_07_0240 2
citrate/isocitrate col005d_blank2 2
creatine col005d_blank2 2
creatine exp048a_01_0000 2
creatine exp048a_01_0020 2
creatine exp048a_01_0060 2
creatine exp048a_01_0240 2
creatine exp048a_01_1080 2
creatine exp048a_01_1440 2
creatine exp048a_02_0000 2
creatine exp048a_02_0020 2
creatine exp048a_02_0060 2
creatine exp048a_02_0240 2
creatine exp048a_03_0000 2
creatine exp048a_03_0020 2
creatine exp048a_03_0060 2
creatine exp048a_03_0240 2
creatine exp048a_04_0000 2
creatine exp048a_04_0020 2
creatine exp048a_04_0060 2
creatine exp048a_04_0240 2
creatine exp048a_05_0000 2
creatine exp048a_05_0020 2
creatine exp048a_05_0060 2
creatine exp048a_05_0240 2
cytidine exp048a_01_0000 2
cytidine exp048a_01_0020 2
cytidine exp048a_01_0060 2
cytidine exp048a_01_0240 2
cytidine exp048a_01_1080 2
cytidine exp048a_01_1440 2
cytidine exp048a_02_0000 2
cytidine exp048a_02_0020 2
cytidine exp048a_02_0060 2
cytidine exp048a_02_0240 2
cytidine exp048a_03_0000 2
cytidine exp048a_03_0020 2
cytidine exp048a_03_0060 2
cytidine exp048a_03_0240 2
cytidine exp048a_04_0000 2
cytidine exp048a_04_0020 2
cytidine exp048a_04_0060 2
cytidine exp048a_04_0240 2
cytidine exp048a_05_0000 2
cytidine exp048a_05_0020 2
cytidine exp048a_05_0060 2
cytidine exp048a_05_0240 2
cytidine exp048a_06_0000 2
cytidine exp048a_06_0020 2
cytidine exp048a_06_0060 2
cytidine exp048a_06_0240 2
cytidine exp048a_07_0000 2
cytidine exp048a_07_0020 2
cytidine exp048a_07_0060 2
cytidine exp048a_07_0240 2
glutamate col005d_blank2 2
glutamine col005d_blank2 2
homocarnosine col005d_blank2 2
isoleucine col005d_blank2 2
lactate col005d_blank2 2
leucine col005d_blank2 2
lysine exp027f4_free_M02_BAT 2
lysine exp027f4_free_M02_brain 2
lysine exp027f4_free_M02_colon 2
lysine exp027f4_free_M02_heart 2
lysine exp027f4_free_M02_kidney 2
lysine exp027f4_free_M02_liver 2
lysine exp027f4_free_M02_pancreas 2
lysine exp027f4_free_M02_plasma_20220909142304 2
lysine exp027f4_free_M02_quad 2
lysine exp027f4_free_M02_spleen 2
lysine exp027f4_free_M03_BAT 2
lysine exp027f4_free_M03_brain 2
lysine exp027f4_free_M03_ceccon 2
lysine exp027f4_free_M03_colon 2
lysine exp027f4_free_M03_dia 2
lysine exp027f4_free_M03_eye 2
lysine exp027f4_free_M03_gWAT 2
lysine exp027f4_free_M03_heart 2
lysine exp027f4_free_M03_iWAT 2
lysine exp027f4_free_M03_jejunum 2
lysine exp027f4_free_M03_kidney 2
lysine exp027f4_free_M03_liver 2
lysine exp027f4_free_M03_lung 2
lysine exp027f4_free_M03_pancreas 2
lysine exp027f4_free_M03_plasma 2
lysine exp027f4_free_M03_quad 2
lysine exp027f4_free_M03_skin 2
lysine exp027f4_free_M03_spleen 2
lysine exp027f4_free_M03_stom 2
lysine exp027f4_free_M03_testis 2
malate col005d_blank2 2
methionine col005d_blank2 2
phenylalanine col005d_blank2 2
proline col005d_blank2 2
pyruvate col005d_blank2 2
serine col005d_blank2 2
succinate col005d_blank2 2
threonine col005d_blank2 2
thymidine exp048a_01_0000 2
thymidine exp048a_01_0020 2
thymidine exp048a_01_0060 2
thymidine exp048a_01_0240 2
thymidine exp048a_01_1080 2
thymidine exp048a_01_1440 2
thymidine exp048a_02_0000 2
thymidine exp048a_02_0020 2
thymidine exp048a_02_0060 2
thymidine exp048a_02_0240 2
thymidine exp048a_03_0000 2
thymidine exp048a_03_0020 2
thymidine exp048a_03_0060 2
thymidine exp048a_03_0240 2
thymidine exp048a_04_0000 2
thymidine exp048a_04_0020 2
thymidine exp048a_04_0060 2
thymidine exp048a_04_0240 2
thymidine exp048a_05_0000 2
thymidine exp048a_05_0020 2
thymidine exp048a_05_0060 2
thymidine exp048a_05_0240 2
thymidine exp048a_06_0000 2
thymidine exp048a_06_0020 2
thymidine exp048a_06_0060 2
thymidine exp048a_06_0240 2
thymidine exp048a_07_0000 2
thymidine exp048a_07_0020 2
thymidine exp048a_07_0060 2
thymidine exp048a_07_0240 2
tryptophan col005d_blank2 2
valine col005d_blank2 2
Number of peak groups with multiple representations (in a sample regardless of sequence): 201
As far as the logic goes, it is only used for retrieving the "last" peak group for a tracer from either any given sample (in DataRepo.models.sample.py
) or from the "last" serum sample (in DataRepo.models.fcirc.py
):
Note the order_by("msrun_sample__msrun_sequence__date")
followed by last()
in both functions. I.e. We sort by date, but we can have 2 peak groups (from the same sample but different sequences) at the end of the ordered results from the same date. The ordering of those 2 equivalent peak groups is arbitrary (if they were run on the same date).
DataRepo.models.sample.py
:
def last_tracer_peak_groups(self):
"""
Retrieves the last Peak Group for each tracer compound
"""
# Get every tracer's compound
if self.animal.tracers.count() == 0:
warnings.warn(f"Animal [{self.animal}] has no tracers.")
return PeakGroup.objects.none()
# Get the last peakgroup for each tracer
last_peakgroup_ids = []
for tracer in self.animal.tracers.all():
tracer_peak_group = (
PeakGroup.objects.filter(msrun_sample__sample__id__exact=self.id)
.filter(compounds__id__exact=tracer.compound.id)
.order_by("msrun_sample__msrun_sequence__date")
.last()
)
if tracer_peak_group:
last_peakgroup_ids.append(tracer_peak_group.id)
else:
warnings.warn(
f"Sample {self} has no peak group for tracer compound: [{tracer.compound}]."
)
return PeakGroup.objects.none()
return PeakGroup.objects.filter(id__in=last_peakgroup_ids)
DataRepo.models.fcirc.py
:
def peak_groups(self):
"""
Retrieve all PeakGroups for this serum sample and tracer, regardless of msrun_sequence date.
Currently unused - see docstring in self.is_last_serum_peak_group
"""
from DataRepo.models.peak_group import PeakGroup
peakgroups = (
PeakGroup.objects.filter(msrun_sample__sample__exact=self.serum_sample)
.filter(compounds__exact=self.tracer.compound)
.order_by("msrun_sample__msrun_sequence__date")
)
if peakgroups.count() == 0:
warnings.warn(
f"Serum sample {self.serum_sample} has no peak group for tracer {self.tracer}."
)
return peakgroups.all()
Note that the "last serum sample" code is a separate function.
This satisfies this task. I will close this and link the multrep discussion page to this issue.
I was looking for some additional info on each multiple representation, so I reworked the query. It's not nearly as concise a solution as yours, @hepcat72, but it does output the peak annotation files and the study names. Also, for some reason I get 202 records, not 201 like you did. I'm not sure what is different.
The script is at tracebase.princeton.edu:/var/www/tracebase/tracebase-multiple-representations.py
.
#!/usr/bin/env python
# coding: utf-8
from DataRepo.models import *
import pandas as pd
import os
pgdf = pd.DataFrame.from_dict(PeakGroup.objects.values('id', 'msrun_sample__sample', 'compounds'))
sample_compound_count = pgdf.groupby(['msrun_sample__sample', 'compounds']).count()
multiple_representations = sample_compound_count.loc[(sample_compound_count['id'] > 1)]
multiple_representations.reset_index()
print("sample\tcompound\tpeak_annotation_files\tstudies")
for row in multiple_representations.reset_index().itertuples():
sample = Sample.objects.get(pk=row.msrun_sample__sample)
compound = Compound.objects.get(pk=row.compounds)
studies = list(sample.animal.studies.all().values_list("name", flat=True))
peak_groups = PeakGroup.objects.filter(msrun_sample__sample=sample, compounds=compound)
print(f"{sample}\t{compound}\t{list(peak_groups.values_list('peak_annotation_file__filename', flat=True))}\t{studies}")
The results are in a Google Sheet: https://docs.google.com/spreadsheets/d/1HsLdBP1AU4OqTphWhUEtRn6lGTzp-gkBAe2Ui5rHncY/edit?usp=sharing
I suspect it's because you used PeakGroup.compounds
instead of PeakGroup.name
, resulting in separate rows for citrate
and isocitrate
. Those get listed as 1 indistinguishable peak group.
(I get 1 row for the pair: citrate/isocitrate col005d_blank2 2
.)
I suspect it's because you used
PeakGroup.compounds
instead ofPeakGroup.name
, resulting in separate rows forcitrate
andisocitrate
. Those get listed as 1 indistinguishable peak group.(I get 1 row for the pair:
citrate/isocitrate col005d_blank2 2
.)
Yeah, that's likely it. Since we use the researcher provided text for the peak group name, I didn't want to use that and risk missing a duplicate that used a synonym, but you're right, this logic isn't quite correct and creates two rows. Really should be a single row.
I didn't want to use that and risk missing a duplicate that used a synonym
As far as I know, we allow peak groups linked to the same compound if a synonym is used (because the synonym could represent a qualitatively different compound, e.g. "stereo-isomers"). Hence, it has been my inference that those are not multiple representations. For example, you could theoretically have a peak group for L-Threonine
and one for R-Threonine
that are from the same sample. I'd inferred that that was part of the point of allowing synonyms to be in the peak group name. Both would intentionally link to the same compound. If that's what we want, then the method you used would risk deleting a valid peak group.
That all said however, I realize that we never made an explicit decision about this. I simply inferred that the reasoning for allowing different synonyms (and making the unique constraint hinge on the name) logically lead to allowing different peak groups of the same compound.
Michael has indicated however, that he thought that perhaps we should not support differentiation of stereoisomers, so I have been assuming that we would eventually decide to go that route. In that context, your method is what we want...
But I think that if that's the case, then we should re-think the peak group name and the unique constraint (as well as the clean method).
It would be useful to generate a list of these and take a close look at the examples. It would also be useful to list out the various pieces of logic that prefer a later sequence.
_Originally posted by @lparsons in https://github.com/Princeton-LSI-ResearchComputing/tracebase/pull/1196#discussion_r1747631333_