galaxyproteomics / tools-galaxyp

Galaxy Tool Shed repositories maintained and developed by the GalaxyP community
MIT License
34 stars 57 forks source link

SearchGUI+PeptideShaker issue while applying GalaxyP proteogenomics Tutorial2 #527

Open luisfdez94 opened 3 years ago

luisfdez94 commented 3 years ago

Galaxy server : usegalaxy.eu Tool version : Galaxy Version 0.1.1 (1) History which contains execution of SearchGUI+Peptide Shaker with the comprehensive database (reference+variants+indel+alt spliced junctions): link here (2) History which contains execution of SearchGUI+Peptide Shaker with reduced database (only indel+alt spliced junctions): link here (3)History which contains execution of SearchGUI+Peptide Shaker with the comprehensive shuffled database (indel+alt spliced junctions+reference+variants): link here (4) History for DB creation: link here

Hello,

I have a created a customized database following 1st Galaxy-P hands-on tutorial (4). As a result, it has these 4 types of header (in this order when scrolling down in the file): 1.reference (30.628 sequences): >generic|ENSP00000355265|.. 2.variants (19.451 sequences): >generic|ENSP00000306146_D8G,V16G|... 3.indel (3.017 sequences): >generic|ENSP00000360709_1123:GAT>CAAT|... 4.alternative spliced junctions (1.183.859 sequences): >generic|STRG.3963.1_u_1370_1415|...

While I was executing the 2nd Galaxy-P hands-on tutorial, with a dataset from my laboratory (HCT-116 cell line), I have noticed this: History (1): When I visualize the PSM report, after executing "Search GUI" + "Peptide Shaker", the resulting PSMs are only matched with 1.reference and 2.variant proteins. Never with 3.indel or 4.alternative spliced junctions. History (2): however, when, in another history, I delete all the 1.reference and 2.variants proteins from the customized database (letting only the 3.indel or 4.alternative spliced junctions) and execute "Search GUI" + "Peptide Shaker" (see this history) with this DB, the PSM report contains only peptides mapped with 3.indel and 4.alt spliced junctions (so I guess is not a problem with the format of these proteins). History (3): finally when I use the comprehensive database but with the sequences in this order : 3.indel, 4.alt spliced junctions, 1.reference, 2.variants, I obtain a different PSM report with the 4 types of proteins. (3)

Do you know why, in my PSM report, the assignation of proteins-to-PSM changes for the different configurations or order of the database? is it because of the size of the DB (1.2M sequences; 1.1M of them are alt spliced)?

Thank you in advance, Luis

PratikDJagtap commented 3 years ago

Thanks @luisfdez94 - we discussed this at our meeting on Friday. The order of the sequences and database size might have had an effect on your identifications. However, @jj-umn and @subinamehta are looking closer into this and will reply to this thread.

With Best Regards, Pratik

luisfdez94 commented 3 years ago

Thank you Pratik! Let me know if you find out something new.

Best regards, Luis

subinamehta commented 3 years ago

We have updated the tools and workflows..please try again! Thank you!

luisfdez94 commented 3 years ago

Hello Subina,

First of all, I hope all of you are fine. By the moment I have only seen the first tutorial and I think now is much more complete. Great job!

I will come to you again after checking all of them.

With best regards,

El sáb, 30 ene 2021 a las 5:35, Subina Mehta (notifications@github.com) escribió:

We have updated the tools and workflows..please try again! Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/galaxyproteomics/tools-galaxyp/issues/527#issuecomment-770155853, or unsubscribe https://github.com/notifications/unsubscribe-auth/APB5N4HOTG2ZVA3VPMH2FW3S4OEALANCNFSM4UQS7GDQ .

-- Luis Fernández Ruiz

subinamehta commented 3 years ago

Thank u! I hope you can try the other tutorials too so that I can close this issue! Thanks, I hope you are doing well too!

luisfdez94 commented 3 years ago

Hello Subina,

I send you the link to my feedback document https://drive.google.com/file/d/1SD1ygnLYXuhHQHpmtdLwkGL4LGPSuCwW/view?usp=sharing. The hands-on tutorial, but also the workflow have improved a lot. In the document there are still some observations in order to make, in my opinion, the hands-on tutorials easier to follow. Maybe the more important thing is changing tabular-to-fasta version to 1.1.1. If not, the workflow does not work properly.

Anyways, congratulations for all the changes you have made. Best regards, Luis

El mié, 3 feb 2021 a las 19:49, Subina Mehta (notifications@github.com) escribió:

Thank u! I hope you can try the other tutorials too so that I can close this issue! Thanks, I hope you are doing well too!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/galaxyproteomics/tools-galaxyp/issues/527#issuecomment-772723157, or unsubscribe https://github.com/notifications/unsubscribe-auth/APB5N4FQNWC5WU3XBEX6SR3S5GKENANCNFSM4UQS7GDQ .

-- Luis Fernández Ruiz

subinamehta commented 3 years ago

Thank you for providing with the feedback, I have incorporated those suggestions. Truly appreciate your help in improving our tutorials!