IonQuant - Githubissues

tobiasko commented 4 years ago

Dear FragPipe team,

we (FGCZ) are intensifying our testing of MSFragger for PASEF data. Could you share some info regarding how IonQuant works?

How do you detect features?
How are features aligned in which space?
Are you using an approach similar to "macth-between-runs" (ID migration)?
In which dimensions is the data recalibrated (I assume the m/z dim. is recalibrated)?
Is is possible to view features that are detected by IonQuant in some way? Would for instance an XIC/XIM in Bruker data analysis or Skyline see something different than IonQuant?
Have you checked how Bruker MS data reduction levels affect identification/quantification?

Sorry for the many questions, but since IonQuant is still kind of a black box (no publication) this is the only way ;-)

Thanks, Tobi

fcyu commented 4 years ago

Hi Tobi,

Thanks for your interest in IonQuant. The paper will be available soon. You will be able to see all the details from the paper.

Best,

Fengchao

On Tue, 17 Mar 2020 at 5:43 AM, Tobias Kockmann notifications@github.com wrote:

Dear FragPipe team,

we (FGCZ) are intensifying our testing of MSFragger for PASEF data. Could you share some info regarding how IonQuant works?

How do you detect features?

How are features aligned in which space?

Are you using an approach similar to "macth-between-runs" (ID migration)?

In which dimensions is the data recalibrated (I assume the m/z dim. is recalibrated)?

Is is possible to view features that are detected by IonQuant in some way? Would for instance an XIC/XIM in Bruker data analysis or Skyline see something different than IonQuant?

Have you checked how Bruker MS data reduction levels affect identification/quantification?

Sorry for the many questions, but since IonQuant is still kind of a black box (no publication) this is the only way ;-)

Thanks, Tobi

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Nesvilab/FragPipe/issues/179, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU27W6XUR7ASPD7DI777JTRH5A3DANCNFSM4LNHBW5A .

-- Dr. Fengchao Yu University of Michigan

tobiasko commented 4 years ago

ok. will there be a preprint?

fcyu commented 4 years ago

Hi @tobiasko ,

Yes, we just put our manuscript to BioRxiv (https://www.biorxiv.org/content/10.1101/2020.03.19.999334v1).

We also have match-between-runs (MBR) done and will release it soon. Please feel free to contact us if you have any questions.

Best,

Fengchao

tobiasko commented 4 years ago

WOW! Very nice. Thanks for sharing the manuscript. Should you need any beta testers or additional data let us know.

tobiasko commented 4 years ago

I finally managed to read your manuscript today! Nice work! In addition, we now have a running FragPipe-like installation on unix incl. IonQuant. Related to this:

Will it be possible in the future to use only the mzBIN file as input for IonQuant (I tried but failed)?
In the manuscript it says:

"When used with Philosopher summary tables as input, IonQuant adds quantification information directly to the tables containing validated PSM, peptide, and protein results." Are the modified files psm.tsv, peptide.tsv and protein.tsv?

What I see is an additional file named *_quant.csv

tobiasko@fgcz-r-033:/scratch/tobiasko/test$ ls -la
total 498948
drwxr-xr-x 4 tobiasko SG_Employees      4096 Apr  2 16:58 .
drwxrwxr-x 6 tobiasko SG_Employees       307 Apr  2 14:22 ..
-rw-r--r-- 1 tobiasko SG_Employees 251863834 Apr  2 14:26 2-21-2020_autoQC4L_444_1_calibrated.mgf
drwxr-xr-x 2 tobiasko SG_Employees        62 Apr  2 16:42 2-21-2020_autoQC4L_444_1.d
-rw-r--r-- 1 tobiasko SG_Employees 162468130 Apr  2 14:25 2-21-2020_autoQC4L_444_1.mzBIN
-rw-r--r-- 1 tobiasko SG_Employees  30618591 Apr  2 14:27 2-21-2020_autoQC4L_444_1.pepXML
-rw-r--r-- 1 tobiasko SG_Employees   6466130 Apr  2 16:58 2-21-2020_autoQC4L_444_1_quant.csv
-rw-r--r-- 1 tobiasko SG_Employees    306775 Apr  2 16:20 delta-mass.html
-rw-rw-r-- 1 tobiasko SG_Employees  38646290 Apr  2 15:44 interact-2-21-2020_autoQC4L_444_1.pep.xml
-rw-r--r-- 1 tobiasko SG_Employees   6482645 Apr  2 15:46 interact.prot.xml
-rw-r--r-- 1 tobiasko SG_Employees   1479118 Apr  2 16:20 ion.tsv
drwxr-xr-x 2 tobiasko SG_Employees      4096 Apr  2 16:20 .meta
-rw-r--r-- 1 tobiasko SG_Employees    267942 Apr  2 16:20 modifications.tsv
-rw-r--r-- 1 tobiasko SG_Employees   1111545 Apr  2 16:20 peptide.tsv
-rw-r--r-- 1 tobiasko SG_Employees   1121670 Apr  2 16:20 protein.fas
-rw-r--r-- 1 tobiasko SG_Employees    559457 Apr  2 16:20 protein.tsv
-rw-r--r-- 1 tobiasko SG_Employees   9490531 Apr  2 16:20 psm.tsv

I am not 100% sure what the features in this table are. Is this the table of identified 4-D features per file?

What I can not find is the MSstats.csv file or reprint-formatted files. Is this normal?

Thx for your great support, Tobi

anesvi commented 4 years ago

I think the problem is, we would need to write MS1 in mzBIN which will make the file too big? Fengchao, what was your explanation for not doing it?

From: Tobias Kockmann notifications@github.com Sent: Thursday, April 2, 2020 11:49 AM To: Nesvilab/FragPipe FragPipe@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [Nesvilab/FragPipe] IonQuant (#179)

External Email - Use Caution

I finally managed to read your manuscript today! Nice work! In addition, we now have a running FragPipe-like installation on unix incl. IonQuant. Related to this:

Will it be possible in the future to use only the mzBIN file as input for IonQuant (I tried but failed)?
In the manuscript it says:

"When used with Philosopher summary tables as input, IonQuant adds quantification information directly to the tables containing validated PSM, peptide, and protein results." Are the modified files psm.tsv, peptide.tsv and protein.tsv?

What I see is an additional file named *_quant.csv

tobiasko@fgcz-r-033:/scratch/tobiasko/test$ ls -la

total 498948

drwxr-xr-x 4 tobiasko SG_Employees 4096 Apr 2 16:58 .

drwxrwxr-x 6 tobiasko SG_Employees 307 Apr 2 14:22 ..

-rw-r--r-- 1 tobiasko SG_Employees 251863834 Apr 2 14:26 2-21-2020_autoQC4L_444_1_calibrated.mgf

drwxr-xr-x 2 tobiasko SG_Employees 62 Apr 2 16:42 2-21-2020_autoQC4L_444_1.d

-rw-r--r-- 1 tobiasko SG_Employees 162468130 Apr 2 14:25 2-21-2020_autoQC4L_444_1.mzBIN

-rw-r--r-- 1 tobiasko SG_Employees 30618591 Apr 2 14:27 2-21-2020_autoQC4L_444_1.pepXML

-rw-r--r-- 1 tobiasko SG_Employees 6466130 Apr 2 16:58 2-21-2020_autoQC4L_444_1_quant.csv

-rw-r--r-- 1 tobiasko SG_Employees 306775 Apr 2 16:20 delta-mass.html

-rw-rw-r-- 1 tobiasko SG_Employees 38646290 Apr 2 15:44 interact-2-21-2020_autoQC4L_444_1.pep.xml

-rw-r--r-- 1 tobiasko SG_Employees 6482645 Apr 2 15:46 interact.prot.xml

-rw-r--r-- 1 tobiasko SG_Employees 1479118 Apr 2 16:20 ion.tsv

drwxr-xr-x 2 tobiasko SG_Employees 4096 Apr 2 16:20 .meta

-rw-r--r-- 1 tobiasko SG_Employees 267942 Apr 2 16:20 modifications.tsv

-rw-r--r-- 1 tobiasko SG_Employees 1111545 Apr 2 16:20 peptide.tsv

-rw-r--r-- 1 tobiasko SG_Employees 1121670 Apr 2 16:20 protein.fas

-rw-r--r-- 1 tobiasko SG_Employees 559457 Apr 2 16:20 protein.tsv

-rw-r--r-- 1 tobiasko SG_Employees 9490531 Apr 2 16:20 psm.tsv

I am not 100% sure what the features in this table are. Is this the table of identified 4-D features per file?

What I can not find is the MSstats.csv file or reprint-formatted files. Is this normal?

Thx for your great support, Tobi

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/179#issuecomment-607928267, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZOPFX4CS26MUMUILDRKSXWLANCNFSM4LNHBW5A.

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

fcyu commented 4 years ago

Hi @tobiasko ,

Currently mzBIN only has MS/MS scans. The reasons of not putting MS scans to mzBIN are 1) It would increase mzBIN's size a lot, which would slow down the identification step (MSFragger). 2) I am not sure if putting MS scans to mzBIN would speed up the whole process because reading MS scans from .d doesn't need those fancy pre-processing steps which take most of the time in loading MS/MS scans. That's also why loading .d in IonQuant (MS scan) takes much less time than loading .d in MSFrgger (MS/MS scan).

And yes, IonQuant will update psm.tsv, ion.tsv, peptide.tsv, protein.tsv, comtined_protein.tsv (if applicable), and combined_peptie.tsv (if applicable) by adding intensities.

The *_quant.csv files contain all quantified PSMs with some measures, such as apex retention time, retention time boundary, apex ion mobility, ion mobility boundary, intensities, and etc. These files are the output of IonQuant at the early stage and the following steps, such as updating Philosopher's tables, dependent on them.

You may need to use Multiple experiment reports in FragPipe to trigger IonQuant to write MSstats compatible file.

Hopefully my answers are clear enough. Please feel free to contact me if there is any questions.

Best,

Fengchao

tobiasko commented 4 years ago

Thx! I just used:

java -Xmx32G -jar /usr/local/nesvilab/IonQuant-1.0.0.jar --multidir "multiExpRes" 2-21-2020_autoQC4L_444_1.d 2-21-2020_autoQC4L_444_1.pepXML

and the console reports:

2020-04-02 18:02:10 [INFO] - multidir = /export/data01/tobiasko/test/multiExpRes

but the folder is not created. Does it need to exist already?

I also can not see a modification on the files you mentioned. Any idea why this might happen?

fcyu commented 4 years ago

For an multi-experiments like this:

The command would be

java -Xmx32g -jar ionquant.jar --psm exp_1\psm.tsv --psm exp_2\psm.tsv --psm exp_3\psm.tsv --psm exp_4\psm.tsv --multidir ./ F:\data\Bruker\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_1_A1_01_2767.d exp_1\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_1_A1_01_2767.pepXML F:\data\Bruker\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_3_A1_01_2769.d exp_3\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_3_A1_01_2769.pepXML F:\data\Bruker\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_2_A1_01_2768.d exp_2\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_2_A1_01_2768.pepXML F:\data\Bruker\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_4_A1_01_2770.d exp_4\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_4_A1_01_2770.pepXML

In this example, the exp_1, exp_2, exp_3, and exp_4 folders should be exist and you are in the parent folder (i.e., ./).

Best,

Fengcao

tobiasko commented 4 years ago

Ahhh wait...what happens if you only have 1 file? Could that be the reason?

fcyu commented 4 years ago

Yes, that's one of the reasons. I am not sure if it make sense to use MSstats with only 1 file. There is not much to normalize and perform differential analysis.

tobiasko commented 4 years ago

Clear. I am just trying if IonQuant works as expected on a very basic test case. But I should still get the modification of the .tsv files, right?

fcyu commented 4 years ago

Yes, you will still get those modified .tsv files as long as you provide --psm.

tobiasko commented 4 years ago

@fcyu I am now executing a sh script as suggested by the linux tutorial. This time with 4 *.d folders. It runs through, but I still don't get a multi dir, or the 'MSstats.csv' file. I also checked the protein.tsv file. All intensity columns are filled with 0. Could it be that IonQuant has problems with file access rights?

fcyu commented 4 years ago

Hi @tobiasko , can you send me your script and the log from IonQuant?

Thanks,

Fengchao

tobiasko commented 4 years ago

Hi @fcyu Does IonQuant write any log files? Can't see any.

fcyu commented 4 years ago

It prints some info to console. If you were using FragPipe, FragPipe would save it to a file. If not, you may need to redirect the printed info to a file.

If you don't have it now, you can send me your command/shell script first.

Best,

Fengchao

tobiasko commented 4 years ago

I repeated only the IonQuant execution and redirected std out and std error to a file.

tobiasko@Tobiass-MBP:~/Downloads > head ionquant.1.txt
IonQuant version IonQuant-1.0.0
Batmass-IO version 1.17.2
timsdata library version timsdata-2-4-4
(c) University of Michigan
System OS: Linux, Architecture: amd64
Java Info: 1.8.0_242, OpenJDK 64-Bit Server VM, Oracle Corporation
JVM started with 28 GB memory
2020-04-03 17:21:58 [INFO] - Parameters:
2020-04-03 17:21:58 [INFO] - threads = 128
2020-04-03 17:21:58 [INFO] - mztol = 10.0

ionquant.1.txt std error was empty.

The bash script for FragPipe functionality is (had to rename to .txt for upload):

tutorial_linux.txt

tobiasko commented 4 years ago

@fcyu I just had the idea to review a log file that was written by FragPipe on Windows. Could it be that the unix script you published is...let's say far away from what FragPipe does (more like a skeleton)? At least I can see some striking difference in how data is organised into folders and how workspaces are handled. What is the logic we have to follow?

fcyu commented 4 years ago

Hi @tobiasko ,

You need to provide --psm and the corresponding psm.tsv path. Without --psm flag, IonQuant would quantify all PSMs in the pepXML then stop, because it didn't know where to find the tsv tables.

You may check the example I gave you again.

Best,

Fengchao

For an multi-experiments like this:

The command would be

java -Xmx32g -jar ionquant.jar --psm exp_1\psm.tsv --psm exp_2\psm.tsv --psm exp_3\psm.tsv --psm exp_4\psm.tsv --multidir ./ F:\data\Bruker\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_1_A1_01_2767.d exp_1\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_1_A1_01_2767.pepXML F:\data\Bruker\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_3_A1_01_2769.d exp_3\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_3_A1_01_2769.pepXML F:\data\Bruker\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_2_A1_01_2768.d exp_2\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_2_A1_01_2768.pepXML F:\data\Bruker\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_4_A1_01_2770.d exp_4\20180819_TIMS2_12-2_AnBr_SA_200ng_HeLa_50cm_120min_100ms_11CT_4_A1_01_2770.pepXML

In this example, the exp_1, exp_2, exp_3, and exp_4 folders should be exist and you are in the parent folder (i.e., ./).

Best,

Fengcao

fcyu commented 4 years ago

@fcyu I just had the idea to review a log file that was written by FragPipe on Windows. Could it be that the unix script you published is...let's say far away from what FragPipe does (more like a skeleton)? At least I can see some striking difference in how data is organised into folders and how workspaces are handled. What is the logic we have to follow?

The logic is that each experiment has it own folder. In the folder there are pepXML and tsv tables from Philosopher. In running IonQuant, one --psm flag indicates one experiment's psm.tsv's path. So, there will be multiple --psm flags in multi-experiments case. Finally, the --multidir flag indicates the parent folder of all experiments.

Best,

Fengchao

tobiasko commented 4 years ago

Hi @fcyu,

wait...do it understand you correctly? The LC-MS table in the FragPipe GUI is represented by a folder structure on the linux command line? So given the table would look like:

file1, expA, 1
file2, expB, 2

I would need to generate folders named expA_1 and expB_2 and place file1 and file2 respectively? Do I have to use _ as a separator? Would placing multiple files in the same folder be treated as a fraction or a tech. replicate (repeated injection)?

fcyu commented 4 years ago

Hi @tobiasko ,

Yes, FragPipe generates folders and puts files into the corresponding folders according to the LC-MS table.

You need to have the tsv tables and the pepXML files in their corresponding folders. The spectral files (e.g., mzML, mzXML, or .d) can be in somewhere else. But need to provide the paths to IonQuant.

Yes, you need to use _ as a separator.

Multiple spectral files in the same Experiment and Replicate would be treated as fractions. You may find a detailed explanation here (https://github.com/Nesvilab/MSFragger/blob/master/tutorial_fragpipe.md#for-reports-with-results-from-different-fractionated-replicates-shown-in-separate-columns).

Best,

Fengchao

tobiasko commented 4 years ago

Ok! And all philosopher commands need to be executed in every folder?

fcyu commented 4 years ago

Yes, and you need to run Philosopher proteinprophet with all interact-.pep.xml files and generate a combined interact-.prot.xml. You also need to run Philosopher abacus to generate combined tsv files.

I suggest you borrowing the commands generated by FragPipe by clicking Dry Run.

Best,

Fengchao

anesvi commented 4 years ago

Tobi

This is why we developed FragPipe -to generate scripts for execution of commands. FragPipe runs on Linux too. As Fengchao said, you can use it to see what commands need to be executed. You can then modify it.

Best Alexey

Sent from my iPhone

On Apr 5, 2020, at 3:48 PM, Fengchao notifications@github.com wrote:

External Email - Use Caution

Yes, and you need to run Philosopher proteinprophet with all interact-.pep.xml files and generate a combined interact-.prot.xml. You also need to run Philosopher abacus to generate combined tsv files.

I suggest you borrowing the commands generated by FragPipe by clicking Dry Run.

Best,

Fengchao

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/179#issuecomment-609472092, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM647QMHIZYLCOZ5XA7DRLDOBVANCNFSM4LNHBW5A.

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

tobiasko commented 4 years ago

Hi @anesvi,

I think the dry run on Linux is very nice idea. Did it yesterday for a closed search and we are now modifying our bash script accordingly.

Thx for the tip, Tobi

tobiasko commented 4 years ago

Hi @fcyu,

I checked the listing for a dry run of a closed search and most is clear. What I still don't fully understand is the PeptideProphet section. Why are you using tmp DIRs here? Are they created by an upstream process or is there some kind of util function that isn't logged?

PeptideProphet: Workspace init [Work dir: /scratch/tobiasko/fragpipe_test/FragPipe_output/expA_1/fragpipe-3-2-2020_11-25-33_autoQC01_463_1_Slot1-54.pepXML-temp]
/usr/local/nesvilab/philosopher-3.2.3/philosopher workspace --init
PeptideProphet: Workspace init [Work dir: /scratch/tobiasko/fragpipe_test/FragPipe_output/expA_2/fragpipe-3-3-2020_09-19-55_autoQC01_470_1_Slot1-54.pepXML-temp]
/usr/local/nesvilab/philosopher-3.2.3/philosopher workspace --init
PeptideProphet: Workspace init [Work dir: /scratch/tobiasko/fragpipe_test/FragPipe_output/expB_3/fragpipe-3-3-2020_10-29-26_autoQC01_471_2_Slot1-54.pepXML-temp]
/usr/local/nesvilab/philosopher-3.2.3/philosopher workspace --init
PeptideProphet: Workspace init [Work dir: /scratch/tobiasko/fragpipe_test/FragPipe_output/expB_4/fragpipe-3-3-2020_11-37-59_autoQC01_472_3_Slot1-54.pepXML-temp]
/usr/local/nesvilab/philosopher-3.2.3/philosopher workspace --init

Greetings, Tobi

chhh commented 4 years ago

@tobiasko Peptide prophet is single threaded, separating stuff out into different folders allows fragpipe to run multiple instances of peptide prophet at once, speeding up the process. This is the only reason.

tobiasko commented 4 years ago

Another question... sorry! I am not familiar with Philosopher, but the pipeline concept looks really attractive to me. Have you tried moving parts of a FragPipe-like linux workflow (a closed search) into a Philosopher pipeline? Pros and Cons versus a pure bash script?

fcyu commented 4 years ago

Hi @tobiasko ,

Don't need to be sorry. I personally prefer shell script because I have the full control of the commands and know exactly what commands are going to run. But Philosopher Pipeline may be easier to run and maintain. Unfortunately, I never used it. Alexey @anesvi and Felipe @prvst are better persons than me answer this question.

Best,

Fengchao

prvst commented 4 years ago

@tobiasko running a full analysis via pipeline mode is a more robust solution than relying on bash scripts. It is also ideal to maintain reproducibility, specially if you are aiming for having a publication. You will need only one configuration file for every program and every analysis step, won't need to bother with command sintaxes and won't need to bother with file names and paths. The pipeline can be triggered with one command, you can find an example here.

tobiasko commented 4 years ago

@prvst That sounds pretty convincing! I am working my way through the example. If I need additional modules not included in the example (like IonQuant) how would I add them?

prvst commented 4 years ago

the current version works with Comet, MSFragger, the Prophets and TMT-Integrator. Any other tool that you want to add in will need to be executed manually

tobiasko commented 4 years ago

IMPRESSIVE! I just executed a complete spectral counting workflow with PASEF data (4 files)...using a single command.

Nesvilab / FragPipe

IonQuant #179