compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
40 stars 16 forks source link

SearchGUI is stuck on peak list conversion #267

Closed pasequeira closed 3 years ago

pasequeira commented 3 years ago

Dear all,

I have my raw files in .t2d and had a hard time converting them to mgf or another universal format. I add to do it by hand, basically. I am worried I've done something wrong because I've put the files on searchGUI (mgf format) and now it is stuck on "converting spectrum file to peak list"

I have checked the tutorial example file given to me and the files seem similar.. I really don't know what is wrong

Can you please help me ?

hbarsnes commented 3 years ago

Do you know which Java version you are using? As SearchGUI (and PeptideShaker) now requires Java 9 or newer. Hopefully an upgrade to a more recent Java version will solve the problem.

pasequeira commented 3 years ago

Hi, I was using Java 8. I have updated and now it is running.

Now another error came, in the Tide algorithm:

error

Thanks again :D

hbarsnes commented 3 years ago

In this case it seems like Tide has a problem generating the index for your FASTA file. Are there any errors earlier up in the same dialog when the Tide index was created?

pasequeira commented 3 years ago

Hello again

Yes, says this: Apparently it doesn't recognize the file as FASTA, but it is. It was retrieved from Uniprot as FASTA

image

hbarsnes commented 3 years ago

Interesting! I just tried with a similar UniProt FASTA file myself and had no such issues. Would you be able to share your FASTA file with me? You can send it to me at harald.barsnes@gmail.com.

pasequeira commented 3 years ago

I've sent you the email, have you received?

hbarsnes commented 3 years ago

I've sent you the email, have you received?

Nope, I have not receive any email yet.

pasequeira commented 3 years ago

Hello,

I've sent you now from my gmail account, see if you received it now ;)

hbarsnes commented 3 years ago

Yes, I got it now! I will have a closer look and get back to you.

hbarsnes commented 3 years ago

I can run the Tide indexer on your FASTA file without any issues:

Thu Nov 26 12:00:16 CET 2020        Indexing uniprot-reviewed-Ncrassa2020_concatenated_target_decoy.fasta for Tide.

INFO: Beginning tide-index.
INFO: Writing results to output directory 'crux-output'.
INFO: CPU: LAPTOP-92114JEJ
INFO: Crux version: 3.0.17109
INFO: 26-Nov-20 

INFO: Running tide-index...
INFO: Writing results to output directory 'fasta-index'.
INFO: Reading C:\Users\Harald\Desktop\123\uniprot-reviewed-Ncrassa2020_concatenated_target_decoy.fasta and computing unmodified peptides...
INFO: Reading proteins
INFO: Wrote 100000 peptides
INFO: Computing modified peptides...
INFO: Created 246366 peptides.
INFO: Precomputing theoretical spectra...
INFO: Elapsed time: 1.37 s
INFO: Finished crux tide-index.
INFO: Return Code:0

Thu Nov 26 12:00:18 CET 2020        Tide Indexing finished for uniprot-reviewed-Ncrassa2020_concatenated_target_decoy.fasta (1.6 seconds).

Can you also share your search parameters? If you have very different settings from me that may explain the difference.

pasequeira commented 3 years ago

I am sorry for the dumb question but what do you mean? This ? image

Maybe I can also ask one thing that I do not fully comprehend. My samples were analysed as a whole without enzymatic digestion; I don't know if I can have some degradation/digestion during the process. Should I put Whole Protein or Unspecified in the "Protease & Fragmentation" section.

Thanks a lot

hbarsnes commented 3 years ago

Aha, if you do not use an enzyme that could explain why the Tide indexing is having problems, as there are basically no peptides to index. And it seems like our "guess" of how to set this up for Tide is not working as wanted. I'll see if I can come up with a fix and release a new version.

My samples were analysed as a whole without enzymatic digestion; I don't know if I can have some degradation/digestion during the process. Should I put Whole Protein or Unspecified in the "Protease & Fragmentation" section.

That depends on what you are looking for. From what I can tell your FASTA file contains complete protein sequences and not peptides. If you then use "Whole Protein" as the digestion type you will probably not get (m)any identifications as you will only search for the intact protein sequences, which will all most likely be too long for the mass spectrometer.

On the other hand, if you go for "Unspecific" you are basically telling the search algorithm to look for peptides of any length as all cleavage sites will be allowed. Note that this generally can take a very long time. However, unless you have reason so believe that there are significant amounts of unspecific cleavage happening in your sample, you will probably not get a lot of hits this way either given that most of your proteins in your sample will still be fully intact.

pasequeira commented 3 years ago

That depends on what you are looking for. From what I can tell your FASTA file contains complete protein sequences and not peptides.

Oh, I also want a peptide database because I'm most likely dealing with peptides. Can you tell me how can I find a proper one? I've followed the standard protocol just to get the system starting.

I am so sorry for all the trouble I am giving you and thanks so much for the help so far

hbarsnes commented 3 years ago

Oh, I also want a peptide database because I'm most likely dealing with peptides.

You will get peptides by choosing a different digestion type than "Whole Protein". But how would you have (lots of) peptides in your sample if you did not use enzymatic digestion? Did you use non-enzymatic digestion or are you expecting a significant amount of protein degradation in your sample? In both cases you can try choosing "Unspecific" digestion option mentioned above.

pasequeira commented 3 years ago

Several analysis during my work lead to the hypothesis of us dealing with peptides and I am trying to identify them. I've extracted the samples and purified fractions using HPLC. We run MALDI TOF/TOF as a external service and they didn't use the enzymatic digestion. I am trying to see if I am able to work this data and avoid repeating the analysis with the digestion.

hbarsnes commented 3 years ago

I've just released a new version of SearchGUI that should solve the issue with the use of "Whole Protein" as the digestion type for Tide. However it now sounds like you would be better of trying the "Unspecific" option instead.

You could also try using one of the two de novo algorithms included in SearchGUI (Novor and DirecTag), as these do not rely on having a protein sequence database as part of the search itself. You can however map the results to a given set of protein sequences afterwards in PeptideShaker.

Although you should also note that both of these approaches often result in a higher number of false positives given that a much larger search space is used.

In any case, I will now close this issue, as the original problems with the Java version and the Tide search have been fixed. But do not hesitate to open a new issue if you come across other problems or have more questions.