comprna / ISOTOPE

ISOform-guided prediction of epiTOPEs in cancer
GNU General Public License v3.0
14 stars 6 forks source link

A5_A3 get_peptide_sequence.py associates same TPMs to different samples #5

Open FraPria opened 2 years ago

FraPria commented 2 years ago

Hello, thank you for developing this useful pipeline! I have a technical question that I would like to address you.

I noticed from the file file A5_A3_NetMHC-4.0_junctions_ORF_neoantigens.tab that samples that share the same event share also the same Transcript_TPM. You can see it from the header of the file (selecting only columns of interest):

Sample_id       Alt_Junction_id Transcript_id   Transcript_TPM
pat1   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773
pat2   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773
pat3   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773
pat4   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773
pat5   chr6;41090308;41091546;+        ENST00000353205.5       3.09240654483773

While if you select the same transcript from iso_tpms.txt matrix they are different.

pat1    pat2    pat3    pat4    pat5
3.092407    3.750489    7.15175 13.89057    4.364625

This seems to rise from line 136 of lib/A5_A3/get_peptide_sequence.py where it takes only the first column of the iso_tpms.txt matrix:

tokens = line.rstrip().split("\t")
transcript = tokens[0]
tpm = tokens[1]
if (transcript not in transcript_expression):
    transcript_expression[transcript] = tpm

So I tested if swapping the columns of iso_tpms.txt could change the results and it did. For the other events this does not happen, and the code is a bit different. For example for the Exonizations it considers all the iso_tpms.txt columns:

tokens = line.rstrip().split("\t")
transcript = tokens[0]
tpm = tokens[1:]
for i in range(0,len(tpm)):
    if (transcript not in transcript_expression[header[i]]):
        transcript_expression[header[i]][transcript] = float(tpm[i])

Should I use this piece of code also for the A5_A3? Thank you in advance

JLTrincado commented 2 years ago

Hi,

Yes, this seems a bug indeed. I have changed it accordingly and quickly tested it and it seems to go smooth. Could you test it as well? I created a new branch to test this.

Thanks for your help.

Best regards,

Juanlu.

FraPria commented 2 years ago

Hi, thanks for your feedback!

I just tested it but it rises the error: 2022-05-20 13:54:04,566 - lib.A5_A3.get_peptide_sequence - ERROR - ERROR: NameError("name 'sample_id' is not defined")

I added sample_id = tokens[0].replace(" ","") at the lines 253 and 1005 and it worked.

Thank you, have a nice day!

EduEyras commented 2 years ago

Thanks,

I've added those lines in the code of the master I've also merged the other fixes. I hope it is fine now Thanks

E.

On Fri, 20 May 2022 at 22:04, FraPria @.***> wrote:

Hi, thanks for your feedback!

I just tested it but it rises the error: 2022-05-20 13:54:04,566 - lib.A5_A3.get_peptide_sequence - ERROR - ERROR: NameError("name 'sample_id' is not defined")

I added sample_id = tokens[0].replace(" ","") at the lines 253 and 1005 and it worked.

Thank you, have a nice day!

— Reply to this email directly, view it on GitHub https://github.com/comprna/ISOTOPE/issues/5#issuecomment-1132820931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB5TTFKVLNDNK2PTYJLVK55WFANCNFSM5WFIVMTQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ