ctarn / PepPre.jl

Promote peptide identification using accurate and comprehensive precursors. https://doi.org/10.1021/acs.jproteome.3c00293
http://peppre.ctarn.io
4 stars 0 forks source link

Bug with Windows GUI #1

Open irleader opened 9 months ago

irleader commented 9 months ago

"task loading from C:\Users\Administrator.PepPre\v1.3\PepPreIsolated.task task loading from C:\Users\Administrator.PepPre\v1.3\PepPreGlobal.task task loading from C:\Users\Administrator.PepPre\v1.3\PepPreAlign.task task failed to load from C:\Users\Administrator.PepPre\v1.3\PepPreAlign.task task loading from C:\Users\Administrator.PepPre\v1.3\PepPreView.task task failed to load from C:\Users\Administrator.PepPre\v1.3\PepPreView.task task loading from C:\Users\Administrator.PepPre\v1.3\extra.cfg task failed to load from C:\Users\Administrator.PepPre\v1.3\extra.cfg task saving to C:\Users\Administrator.PepPre\v1.3\PepPreIsolated.task task saving to D:/OneDrive/methods/Precursor_mass\out4\PepPreIsolated.task cmd: ('content\ThermoRawRead\ThermoRawRead.exe', 'mes', 'D:/OneDrive/methods/Precursor_mass/HEK_control_inj1.raw', 'D:/OneDrive/methods/Precursor_mass\out4') loading D:/OneDrive/methods/Precursor_mass/HEK_control_inj1.raw reading scan data (0 / 113548) reading scan data (10000 / 113548) reading scan data (20000 / 113548) reading scan data (30000 / 113548) reading scan data (40000 / 113548) reading scan data (50000 / 113548) reading scan data (60000 / 113548) reading scan data (70000 / 113548) reading scan data (80000 / 113548) reading scan data (90000 / 113548) reading scan data (100000 / 113548) reading scan data (110000 / 113548) meta data saved as D:/OneDrive/methods/Precursor_mass\out4\HEK_control_inj1.txt scan list saved as D:/OneDrive/methods/Precursor_mass\out4\HEK_control_inj1.csv writing peak mass (0 / 113548) writing peak mass (10000 / 113548) writing peak mass (20000 / 113548) writing peak mass (30000 / 113548) writing peak mass (40000 / 113548) writing peak mass (50000 / 113548) writing peak mass (60000 / 113548) writing peak mass (70000 / 113548) writing peak mass (80000 / 113548) writing peak mass (90000 / 113548) writing peak mass (100000 / 113548) writing peak mass (110000 / 113548) writing peak intensity (0 / 113548) writing peak intensity (10000 / 113548) writing peak intensity (20000 / 113548) writing peak intensity (30000 / 113548) writing peak intensity (40000 / 113548) writing peak intensity (50000 / 113548) writing peak intensity (60000 / 113548) writing peak intensity (70000 / 113548) writing peak intensity (80000 / 113548) writing peak intensity (90000 / 113548) writing peak intensity (100000 / 113548) writing peak intensity (110000 / 113548) scan data saved as D:/OneDrive/methods/Precursor_mass\out4\HEK_control_inj1.mes cmd: ('content\PepPre\bin\PepPreIsolated', 'D:/OneDrive/methods/Precursor_mass\out4\HEK_control_inj1.mes', '--out', 'D:/OneDrive/methods/Precursor_mass\out4', '--ipv', 'C:\Users\Administrator\.PepPre\v1.3\peptide.ipv', '--width', '2.0', '--charge', '1:10', '--error', '10.0', '--thres', '1.0', '--fold', '4.0', '--fmt', 'mgf') fatal: error thrown and no exception handler available. Exception in thread Thread-2: Traceback (most recent call last): File "threading.py", line 932, in _bootstrap_inner File "threading.py", line 870, in run File "util.py", line 131, in run File "PepPreIsolated.py", line 56, in run File "util.py", line 146, in call File "util.py", line 62, in run_cmd File "codecs.py", line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd7 in position 719: invalid continuation byte"

This error is always there when using isolated precursor or global precursor. The output folder has four files like this. It seems the error is with writing the .mes file, so the .mes file is not complete. If the .csv file is already complete and accurate, I will use the precursor mass and charge from the csv file, is that OK? 1706370138453

ctarn commented 9 months ago

Hi! Thanks for reaching out. May I ask if the software was downloaded and saved to a path including some special characters such as Chinese words? If so, please move the software to another location and try again.

ctarn commented 9 months ago

The .mes is the output of ThermoRawRead since PepPre can not process .raw files directly. It is a binary spectrum format, and the file has been completed successfully. The .csv is also extracted from the .raw file, and it only contains original data from the instrument. The above error occurred when calling PepPre. When the output format "CSV" is selected, PepPre would create a .csv file with a filename ending with .precursor.csv.

irleader commented 9 months ago

Hi Tarn,

Thanks a lot for your reply, it works!

isolated precursor generated a precursor.csv file, but for each scan, there are multiple candidates. Do I use the first one? golbal precursor also generated a precursor.csv file, how should I interpret this to get mz and z for each scan?

Also, I am new to peptide precursor prediction, in order to get the most accurate mass and charge, shall I use isolated precursor or global precursor? And what are the most recommended settings, for example, isolation width=2, exclusion threshold=1 and precursor number=4 for isolated precursor; num of peaks=4000, exclusion threshold=1, max scan gap=16 for global precursor? I do not care about runing time.

I am using data from thermofisher orbitrap, and need very accurate precursor mass and charge for downstream analaysis.

Best regards

ctarn commented 9 months ago

Caused by co-elution, there may be more than one precursor in one isolation window, and thus PepPre would also export many precursors for one MS/MS scan. The precursors for an MS/MS scan are sorted by score currently. If you only want one precursor per MS/MS scan, it is OK to keep the first one.

Global or isolated, it depends on your application. For global precursor detection, PepPre detects all precursors regardless of whether the precursors are isolated/fragmented. You can match such precursors with MS/MS scans using RT and isolation window, available in the previously mentioned .csv (not .precursor.csv).

For parameters, if the input file is .raw file, you can set isolation window to auto, since it can be extracted from the file directly. Setting precursor number to 4 means that PepPre would export 4 precursors per MS/MS scan on average. If you want more accurate precursors, you can set it to a smaller value. For other parameters, it is recommended to leave them as default values.

irleader commented 9 months ago

Thanks a lot for your detailed explanation!

So if I am only interested in precursors that are fragmented (precursors of MS2), I do not have to run global precursor, right?

After benchmarking the results from isolate precursor with some ground truths from database search, I start to understand why default precursor number is 4, as when there are an average of four candidates, most spectra will have at least 1 candidate. And this also makes sure there is at least a correct candidate among the 4 predictions (most correct candidates are among the top 2 predictions), am I correct?

ctarn commented 9 months ago
  1. Yes. Additionally, the csv of global precursors includes more useful information.
  2. Yes. 4-fold is a safe threshold so that most identifiable peptide precursors are included, and the amount will not be too many to searching. For DDA data, the isolation window is usually about 2 Th, and thus 4 precursors per window (or per MS/MS scan) is sufficient in most cases. If the isolation window is much larger than normal narrow window, such as DIA data, you may need to increase the threshold accordingly to avoid missing some peptides.
ctarn commented 9 months ago

Additionally, please note that, given a precursor of one MS/MS scan being identified, it doesn't mean that other precursors of the MS/MS scan are incorrect, since co-fragmentation is very common in practice.