compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
38 stars 16 forks source link

Partial results only #324

Closed laurenfields closed 1 year ago

laurenfields commented 1 year ago

Hello,

I am new to SearchGUI, but I have been submitting a few .raw files, my database, and I have been trying to use "whole protein" in the enzyme selection field. I strictly want to do a database search to compare the results across platforms. However, each time I run, a message returns stating that only partial results can be exported, and I can see no results in Shaker. I have attached the output I was provided. Any help would be greatly appreciated

Thanks, Lauren searchgui_out.zip .

hbarsnes commented 1 year ago

Hi Lauren,

In order to process your data and understand what the problem is I would also need the FASTA and spectrum files used for the search.

But perhaps you can first share a screenshot of the message you are getting when trying to load the data in PeptideShaker?

Best regards, Harald

laurenfields commented 1 year ago

Hi Harald,

Thank you for your quick response! Here is a link to a folder upload, which has the spectrum files, the FASTA, the output, screenshots of settings, and the dialog produced. I tried running with .raw files, and then with .mzml files of the same spectra. Let me know if I can provide anything else to help fix this issue.

[https://uwmadison.box.com/s/28xbcpxrt8c9zzyx3sed5xkw7a2343w9]

Thanks again, Lauren

hbarsnes commented 1 year ago

The problems seems to be the formatting of your FASTA file. Here's an example of an invalid entry:

>lcl|MH801210.1_prot_QCQ20669.1_1_gene=ND2_protein=NADH_dehydrogenase_subunit_2_protein_id=QCQ20669.1_location=70..1077_gbkey=CDS
MTFPISYLFFFSTLLIGSVLSISSSSWFGCWLGLELNLLSFIPLITTKLHSYLSEAAIKYFLVQALASTVLIMSASALLFNPELSHIMILLSLMLKLGAAPMHFWFPQVMEGLSWPQAFILLTIQKLAPMFLISYLTFSEALMNIILYFALISSVIGALGGLNVTMMRKLMAFSSINHMSWMLIAIYMSDIYWLLYFSLYCLTSGSVIMILYSTQSFSISDILNQSSKKMYLNMLVPMNILSLGGLPPFLGFMPKWALIQFMSQDLMIFPLAVLLGSSLITLYFYMRLFIPLTLMSFSSFVSNLKNSSLFSSSPLMMTLTSINLFGILLPIPFFL">lcl|MH801210.1_prot_QCQ20670.1_2_gene=COX1_protein=cytochrome_c_oxidase_subunit_I_transl_except=(pos:1534..1534,aa:TERM)_protein_id=QCQ20670.1_location=1276..2809_gbkey=CDS"MQRWFFSTNHKDIGTLYFIFGAWSGMVGTSLSLIIRAELGQPGTLIGNDQIYNVVVTAHAFVMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPSLTLLLMSGMVESGVGTGWTVYPPLAAAIAHAGASVDLGIFSLHLAGVSSILGAVNFMTTVINMRSFGMSMDQMPLFVWSVFITAILLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPVLYQHLFWFFGHPEVYILILPAFGMISHIVSQESGKKESFGTLGMIYAMMAIGILGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLSTLHGTQINYSPSMLWALGFIFLFTVGGLTGVVLANSSIDIILHDTYYVVAHFHYVLSMGAVFGIFAGIAHWFPLFTGMSLNPKWMKIHFAIMFIGVNVTFFPQHFLGLNGMPRRYSDYPDAYTTWNVVSSMGSMVSLIAMLIFMIIIWEALISNRPVMFSPFLPSSIEWNHSYPPADHSYMEIPLITN">lcl|MH801210.1_prot_QCQ20671.1_3_gene=COX2_protein=cytochrome_c_oxidase_subunit_II_transl_except=(pos:685..685,aa:TERM)_protein_id=QCQ20671.1_location=2888..3572_gbkey=CDS"MATWTFLSLQDSASPLMEQLIFFHDHIMVVLIMIITFVGYMMASILTNSFINRYMLENQTIELIWTALPAIILIFIALPSLRLLYLLDEVNNPSVTLKTVGHQWYWSYEYSDFMNVEFDSYMTPTNELADSGFRLLEVDNRTVLPMNTQIRVVITAADVIHSWTVPALGVKADAIPGRLNQVSFMISRPGLFYGQCSEICGANHSFMPIVIESVNTNSFLNWISSCSD

Note how there are multiple protein sequences within the same line. In this particular case there are three sequences, each with its own header, that have to be split up before researching the data in SearchGUI.

The example above ought to look like this:

>lcl|MH801210.1_prot_QCQ20669.1_1_gene=ND2_protein=NADH_dehydrogenase_subunit_2_protein_id=QCQ20669.1_location=70..1077_gbkey=CDS
MTFPISYLFFFSTLLIGSVLSISSSSWFGCWLGLELNLLSFIPLITTKLHSYLSEAAIKYFLVQALASTVLIMSASALLFNPELSHIMILLSLMLKLGAAPMHFWFPQVMEGLSWPQAFILLTIQKLAPMFLISYLTFSEALMNIILYFALISSVIGALGGLNVTMMRKLMAFSSINHMSWMLIAIYMSDIYWLLYFSLYCLTSGSVIMILYSTQSFSISDILNQSSKKMYLNMLVPMNILSLGGLPPFLGFMPKWALIQFMSQDLMIFPLAVLLGSSLITLYFYMRLFIPLTLMSFSSFVSNLKNSSLFSSSPLMMTLTSINLFGILLPIPFFL

>lcl|MH801210.1_prot_QCQ20670.1_2_gene=COX1_protein=cytochrome_c_oxidase_subunit_I_transl_except=(pos:1534..1534,aa:TERM)_protein_id=QCQ20670.1_location=1276..2809_gbkey=CDS
MQRWFFSTNHKDIGTLYFIFGAWSGMVGTSLSLIIRAELGQPGTLIGNDQIYNVVVTAHAFVMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPSLTLLLMSGMVESGVGTGWTVYPPLAAAIAHAGASVDLGIFSLHLAGVSSILGAVNFMTTVINMRSFGMSMDQMPLFVWSVFITAILLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPVLYQHLFWFFGHPEVYILILPAFGMISHIVSQESGKKESFGTLGMIYAMMAIGILGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLSTLHGTQINYSPSMLWALGFIFLFTVGGLTGVVLANSSIDIILHDTYYVVAHFHYVLSMGAVFGIFAGIAHWFPLFTGMSLNPKWMKIHFAIMFIGVNVTFFPQHFLGLNGMPRRYSDYPDAYTTWNVVSSMGSMVSLIAMLIFMIIIWEALISNRPVMFSPFLPSSIEWNHSYPPADHSYMEIPLITN

>lcl|MH801210.1_prot_QCQ20671.1_3_gene=COX2_protein=cytochrome_c_oxidase_subunit_II_transl_except=(pos:685..685,aa:TERM)_protein_id=QCQ20671.1_location=2888..3572_gbkey=CDS
MATWTFLSLQDSASPLMEQLIFFHDHIMVVLIMIITFVGYMMASILTNSFINRYMLENQTIELIWTALPAIILIFIALPSLRLLYLLDEVNNPSVTLKTVGHQWYWSYEYSDFMNVEFDSYMTPTNELADSGFRLLEVDNRTVLPMNTQIRVVITAADVIHSWTVPALGVKADAIPGRLNQVSFMISRPGLFYGQCSEICGANHSFMPIVIESVNTNSFLNWISSCSD

BTW, if getting similar isses in the future, and not seeing any errors in SearchGUI, it is highly recommended to process the results separately in PeptideShaker (via the New Project option). That way you should get more information about the error.

laurenfields commented 1 year ago

This worked like a charm! Thanks so much!