MonashProteomics / FragPipe-Analyst

GNU General Public License v3.0
10 stars 4 forks source link

Redundant column names in TMT combined_annotation.txt file #46

Closed ginnyintifa closed 1 year ago

ginnyintifa commented 1 year ago

In the TMT workflow, in the combined_annotative.tsv.

Some columns are redundant.

1 Replicate. We are not allowing user to have duplicated sample IDs i.e. label column. so the replicate isn't helpful.every row would have value 1 in this column. I don't know if this column would be involved in any calculation in the functionalities.

2 experiment/plex I think we can just keep one of them, preferrably experiment.

hsiaoyi0504 commented 1 year ago

For the second point, although it's redundant, keeping this gives users a little bit more flexibility for customizing the plot.

hsiaoyi0504 commented 1 year ago

For the first point, I think they are for different purposes. [sample ID]_1 and [sample ID]_2 are technical replicates. The replicate column, however, is initially designed for bioreplicates as we have in the FragPipe manifest.

hsiaoyi0504 commented 1 year ago

But it's actually confusing, for example, this is FragPipe manifest from ccRCC discovery DIA data: dia_manifest (1).fp-manifest.txt The experiment column is [Condition]_[sample num], and the replicate column here becomes more like technical replicates.

hsiaoyi0504 commented 1 year ago

Updated in https://github.com/MonashProteomics/FragPipe-Analyst/commit/d17fbf9cdd388c8ae284599b64b19a3b5feb4913..

Now the combined_annotation.txt will looks like this plex channel label condition replicate
3 126 CPT0148080004 Tumor 1
3 127N CPT0148100003 Normal 1
3 127C CPT0083100003 Tumor 1
3 128N CPT0083140003 Normal 1
3 128C CPT0118160003 Tumor 1
3 129N CPT0118180003 Normal 1
3 129C CPT0105510003 Tumor 1
3 130N CPT0105550003 Normal 1
3 130C TumorOnlyIR03
3 131N pool03 pool 3
7 126 CPT0109680004 Tumor 1
7 127N CPT0109700003 Normal 1
7 127C CPT0105940003 Tumor 1
7 128N CPT0105980003 Normal 1
7 128C CPT0090170004 Tumor 1
7 129N CPT0090200003 Normal 1
7 129C CPT0105330003 Tumor 1
7 130N CPT0105370004 Normal 1
7 130C CPT0148080004 Tumor 2
7 131N pool07 pool 7
ginnyintifa commented 1 year ago

Looking good! And do users need to change column names in the report now?

hsiaoyi0504 commented 1 year ago

That's correct. Please let me know if you find any issue. Thanks for the contribution!

hsiaoyi0504 commented 1 year ago

@ginnyintifa I would like re-open this. Actually, FragPipe doesn't allow users to have the same label for different samples anymore. Therefore, the user needs to indicate _1, _2 in the label column, and the shared prefix before _1, _2 is used to differentiate samples from different patients.

ginnyintifa commented 1 year ago

@ginnyintifa I would like re-open this. Actually, FragPipe doesn't allow users to have the same label for different samples anymore. Therefore, the user needs to indicate _1, _2 in the label column, and the shared prefix before _1, _2 is used to differentiate samples from different patients.

Really? in TMT workflow, duplicated labels in the same plex is not allowed. But if there are 2 name labels in different plexes, it is allowed right? But this would cause duplicates seen in TMT-I report.

hsiaoyi0504 commented 1 year ago

Yes, but what I am talking about is which should be the standard way annotating samples. I still keep the logic for handling duplicate labels, but probably we should use _ as the standard way. Any thoughts?