compomics / peptide-shaker

Interpretation of proteomics identification results
http://compomics.github.io/projects/peptide-shaker.html
48 stars 19 forks source link

Peptide-shaker not importing data #486

Closed catamoreno4 closed 2 years ago

catamoreno4 commented 2 years ago

Hi, Im trying to follow the tutorial from the proteomics bioinformatics course and when using my raw data is not working. first time i received this error...

Tue Aug 30 09:51:48 BST 2022 Import process for ECM

Tue Aug 30 09:51:49 BST 2022 Importing sequences from uniprot-human-reviewed-trypsin-august-2022_concatenated_target_decoy.fasta. Tue Aug 30 09:51:50 BST 2022 Importing gene mappings. Tue Aug 30 09:51:52 BST 2022 Establishing local database connection. Tue Aug 30 09:51:52 BST 2022 Reading identification files. Tue Aug 30 09:51:52 BST 2022 Parsing 13886.omx.gz. Tue Aug 30 09:52:22 BST 2022 Checking spectra for 13886.omx.gz. Tue Aug 30 09:52:22 BST 2022 Importing PSMs from 13886.omx.gz Tue Aug 30 09:52:24 BST 2022 310 identified spectra (2.3%) did not present a valid peptide. Tue Aug 30 09:52:24 BST 2022 4255 of the best scoring peptides were excluded by the import filters: Tue Aug 30 09:52:24 BST 2022 - 83.0% peptide length less than 8 or greater than 30. Tue Aug 30 09:52:24 BST 2022 - 16.8% peptide presenting high mass or isotopic deviation. Tue Aug 30 09:52:24 BST 2022 Parsing 13886.res.gz. Tue Aug 30 09:52:25 BST 2022 Checking spectra for 13886.res.gz. Tue Aug 30 09:52:25 BST 2022 Importing PSMs from 13886.res.gz Tue Aug 30 09:52:29 BST 2022 2 identified spectra (0.0%) did not present a valid peptide. Tue Aug 30 09:52:29 BST 2022 21504 of the best scoring peptides were excluded by the import filters: Tue Aug 30 09:52:29 BST 2022 - 99.7% peptide length less than 8 or greater than 30. Tue Aug 30 09:52:29 BST 2022 Parsing 13886.t.xml.gz. Tue Aug 30 09:52:29 BST 2022 Checking spectra for 13886.t.xml.gz. Tue Aug 30 09:52:29 BST 2022 Spectrum file named 'OneDrive - King' required to parse '13886.t.xml.gz' not found.

Tue Aug 30 09:52:29 BST 2022 Importing Data Canceled!

second time this

Tue Aug 30 12:14:04 BST 2022 Import process for ECM2

Tue Aug 30 12:14:04 BST 2022 Importing sequences from uniprot-human-reviewed-trypsin-30Ago-2022_concatenated_target_decoy.fasta. Tue Aug 30 12:14:20 BST 2022 Importing gene mappings. Tue Aug 30 12:14:20 BST 2022 Neanderthal (Homo sapiens neanderthalensis) not available in Ensembl. Tue Aug 30 12:14:22 BST 2022 Establishing local database connection. Tue Aug 30 12:14:22 BST 2022 Reading identification files. Tue Aug 30 12:14:22 BST 2022 Parsing 13886.t.xml.gz. Tue Aug 30 12:14:22 BST 2022 No PSM found in 13886.t.xml.gz. Tue Aug 30 12:14:22 BST 2022 No identification results.

Tue Aug 30 12:14:22 BST 2022 Importing Data Canceled!

hbarsnes commented 2 years ago

Tue Aug 30 09:52:29 BST 2022 Spectrum file named 'OneDrive - King' required to parse '13886.t.xml.gz' not found.

This may indicate that the spectrum file is located on a shared drive that is not available? Is that the case? Perhaps you can try including the spectrum and FASTA files in the SearchGUI output? This is done via Edit > Advanced Settings. Make sure that both the "Single Zip File" and "Include Spectra and Database" are selected.

It may also be that the name of the raw file is causing issues. Please make sure that there are no non-standard characters and preferably also remove any white space.

Tue Aug 30 12:14:22 BST 2022 No PSM found in 13886.t.xml.gz. Tue Aug 30 12:14:22 BST 2022 No identification results.

This indicates that there are no identification in that given X! Tandem output. Have a closer look at the search parameters and see if they are correct and that you are not too strict with the tolerances. Probably also worth making sure that the spectrum file contains enough high quality spectra.

Tue Aug 30 12:14:20 BST 2022 Neanderthal (Homo sapiens neanderthalensis) not available in Ensembl.

BTW, this means that you also have non-human sequences in your database, which I assume was not intended? Best to have another go at how you are selecting which protein sequences to include. :)

catamoreno4 commented 2 years ago

Hi Harald

Thanks for your answer

I tried using the uniprot file we had from the course last month at EMBL and i didnt receive the neanderthal warning.

I started again over and over the whole process from converting the raw data (I used ms convert and the thermoGUI) just to see if that could help but nothing. When I'm running the search GUI it does the xTandem and it seems ok but when it goes to OMSSA, after doing the search it says the file could be found but then it says complete. Then when i used this searchguiout file gives me the same notification

Tue Aug 30 16:02:29 BST 2022 Importing sequences from uniprot-human-reviewed-trypsin-june-2021_concatenated_target_decoy.fasta. Tue Aug 30 16:02:30 BST 2022 Importing gene mappings. Tue Aug 30 16:02:34 BST 2022 Establishing local database connection. Tue Aug 30 16:02:34 BST 2022 Reading identification files. Tue Aug 30 16:02:34 BST 2022 Parsing 13886.t.xml.gz. Tue Aug 30 16:02:34 BST 2022 No PSM found in 13886.t.xml.gz. Tue Aug 30 16:02:34 BST 2022 No identification results.

Tue Aug 30 16:02:34 BST 2022 Importing Data Canceled!

its 2 days i try to make this works but nothing. i don't know what to do as this is all new to me, I also tried to follow your recommendations but I don't know how...

Make sure that both the "Single Zip File" and "Include Spectra and Database" are selected. ??

This indicates that there is no identification in that given X! Tandem output. Have a closer look at the search parameters and see if they are correct and that you are not too strict with the tolerances. Probably also worth making sure that the spectrum file contains enough high quality spectra.??

I'm finding this quite frustrating, when we did the tutorial in the course everything worked perfectly... :/

Many thanks for your help

On Tue, Aug 30, 2022 at 12:55 PM Harald Barsnes @.***> wrote:

Tue Aug 30 09:52:29 BST 2022 Spectrum file named 'OneDrive - King' required to parse '13886.t.xml.gz' not found.

This may indicate that the spectrum file is located on a shared drive that is not available? Is that the case? Perhaps you can try including the spectrum and FASTA files in the SearchGUI output? This is done via Edit > Advanced Settings. Make sure that both the "Single Zip File" and "Include Spectra and Database" are selected.

It may also be that the name of the raw file is causing issues. Please make sure that there are no non-standard characters and preferably also remove any white space.

Tue Aug 30 12:14:22 BST 2022 No PSM found in 13886.t.xml.gz. Tue Aug 30 12:14:22 BST 2022 No identification results.

This indicates that there are no identification in that given X! Tandem output. Have a closer look at the search parameters and see if they are correct and that you are not too strict with the tolerances. Probably also worth making sure that the spectrum file contains enough high quality spectra.

Tue Aug 30 12:14:20 BST 2022 Neanderthal (Homo sapiens neanderthalensis) not available in Ensembl.

BTW, this means that you also have non-human sequences in your database, which I assume was not intended? Best to have another go at how you are selecting which protein sequences to include. :)

— Reply to this email directly, view it on GitHub https://github.com/compomics/peptide-shaker/issues/486#issuecomment-1231563124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVL7T6ZPYLBVB75BLX4F3ALV3XZCJANCNFSM6AAAAAAQAJ6334 . You are receiving this because you authored the thread.Message ID: @.***>

-- Catalina Moreno

hbarsnes commented 2 years ago

Maybe you can try sharing the SearchGUI output with me? Then I can try to reproduce the problem and hopefully locate the underlying issue?

hbarsnes commented 2 years ago

Issue assumed resolved. If this is not the case, please let us know and we'll reopen the issue.