ftwkoopmans / msdap

MS-DAP: downstream analysis pipeline for quantitative proteomics
GNU General Public License v3.0
29 stars 6 forks source link

No match-between-runs data found for IonQuant #8

Closed ionurchin closed 9 months ago

ionurchin commented 1 year ago

Hello and thanks for developing MS-DAP! I am using the latest version of FragPipe to search and quantify DDA data. There have been some changes made in later versions of this software which affect output data. Maybe this not an MS-DAP problem. But when importing data from the FragPipe output directory I get the below info message:

info: reading IonQuant data from FragPipe output folder... info: no match-between-runs data found (this file is missing: """) ----snip----

The analysis proceeds to apparent completion. What is the significance of this message? Is there anything I can do to fix it?

Best regards, Brian

ftwkoopmans commented 1 year ago

Importing data from FragPipe is a bit of a technical challenge because the peptide-level data that we need for MS-DAP is spread throughout a number of FragPipe output files (in contrast, some software like e.g. DIA-NN and Spectronaut provide a single table that contains all relevant data so that's much easier to work with).

So we wrote a bunch of code to extract the MS-DAP required data from a FragPipe output folder, but it does require that you assigned experiment IDs in the FragPipe workflow tab. This makes the FragPipe software aware of the experimental design and consequentially it produces some files that MS-DAP needs (our admittedly minimal instructions can be found at https://github.com/ftwkoopmans/msdap/blob/master/doc/userguide.md#fragpipe ).

If you successfully followed this workflow with previous FragPipe versions but it doesn't work after upgrading to a new FragPipe version, then I can have a look and see what changed on FragPipe's side and figure out what it'll take to make MS-DAP compatible.

ftwkoopmans commented 1 year ago

I haven't heard back from you so just wanted to check whether you've managed to re-analyze your data in FragPipe with the experiment info filled in at the "Workflow" tab, and thereby got your MS-DAP analyses going as well. Today I downloaded the recent FragPipe 19.1 release and verified that MS-DAP still plays nice with its output. Below I have documented my workflow (will also add this example to the MS-DAP user guide when we next update the GitHub repository).

========================================================

In this example we'll apply FragPipe 19.1 with default settings on the DDA label-free quantification dataset from O'Connel et al. (PMID: 29635916, www.proteomexchange.org identifier PXD007683).

1) Main configuration for this example takes place in the "Workflow" tab;

2) Provide a fasta database in the "Database" tab (and add decoys if you haven't already). For this study we provide a proteome database that contains both the Human and Yeast proteomes.

3) All other settings we leave to default in this example; now we can go to the "Run" tab to provide an output folder and start the FragPipe analysis.

screenshots to confirm settings;

fragpipe_workflow_updated

FragPipe_database

FragPipe_Quant_MS1

Now that we have processed the raw data, we can import the dataset into MS-DAP using the following example R code;

library(msdap)
# make sure to update this to the path where you stored your output, as configured in the FragPipe "Run" tab
dataset = import_dataset_fragpipe_ionquant("C:/DATA/PXD007683_oconnel-2018/fragpipe_output", acquisition_mode = "dda")
# this should be the same fasta file as provided in the FragPipe "Database" tab
dataset = import_fasta(dataset, files = "C:/DATA/PXD007683_oconnel-2018/UniProt_2022-02/HUMAN_YEAST_complete.fasta")

# remaining MS-DAP steps after importing the dataset are described in the user guide and "quickstart" on the main MS-DAP GitHub page
ionurchin commented 1 year ago

Hi! Sorry for not replying sooner. To clarify original post, I've not used MSDAP on previous versions of FragPipe. I was only referring to the possibility of changes made to later versions such as I've seen within output files which now have somewhat different columns in the protein groups file for example, which may have affected MSDAP were it to have been developed on older versions.. just blue skying at the time. Below are several files which show the end of an MSDAP report with the R Command Summary and output as well as the versions of packages installed. Another file which is a screen shot of the Fragpipe Workflow page and a snippet of IonQuant output showing the last iteration of MBR demonstrating that indeed MBR is occurring.

The entire MS-DAP workflow competes without error but I am concerned about the output message indicating it cannot find the MBR data ... "file missing". This is an INFO message not warning or error message.

One last thing is that I use R/MS-DAP on an iMac and the working directory is a network mounted drive on the PC that runs Fragpipe.

mdfragger-workflow msdap-report-R-command-history.pdf MSFragger-Log-Ionquant-snippet.pdf

ftwkoopmans commented 1 year ago

ok great feedback, thanks! Inspecting the dataset I analyzed earlier today and checking the FragPipe output files and their GitHub repo, it seems there have been some changes to the FragPipe output files. I'll investigate further, write a MS-DAP function for importing these new FragPipe formats and then upload a beta version here so you can have a look (and don't have to wait for the next MS-DAP release). I expect to get back to you in a few days, hope you can test and let me know how it goes :)

ps. just to clarify; the 'info' message means that MS-DAP cannot find the match-between-runs data and thus the dataset/analyses are based on only identified peptides (PSMs). I'll change this to a more elaborate warning next release for clarity.

ionurchin commented 1 year ago

Thank you very much! I’ll test out the updated code as soon as its posted.

On Mar 2, 2023, at 7:17 PM, Frank Koopmans @.***> wrote:

ok great feedback, thanks! Inspecting the dataset I analyzed earlier today and checking the FragPipe output files and their GitHub repo, it seems there have been some changes to the FragPipe output files. I'll investigate further, write a MS-DAP function for importing these new FragPipe formats and then upload a beta version here so you can have a look (and don't have to wait for the next MS-DAP release). I expect to get back to you in a few days, hope you can test and let me know how it goes :)

ps. just to clarify; the 'info' message means that MS-DAP cannot find the match-between-runs data and thus the dataset/analyses are based on only identified peptides (PSMs). I'll change this to a more elaborate warning next release for clarity.

— Reply to this email directly, view it on GitHub https://github.com/ftwkoopmans/msdap/issues/8#issuecomment-1452758387, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFZ5INCQ6DY6L2IBCNVT2DW2E2DDANCNFSM6AAAAAAVCK6TUA. You are receiving this because you authored the thread.

ftwkoopmans commented 1 year ago

I've attached a beta version, you can use it as follows;

library(msdap)
source("C:/PATH TO SCRIPT HERE/fragpipe_parser_beta__2023-03-05.R")
dataset = import_dataset_fragpipe_ionquant("C:/DATA/PXD007683_oconnel-2018/fragpipe_output", acquisition_mode = "dda", collapse_peptide_by = "sequence_modified")

The source() command loading the R script will overwrite the current MS-DAP import_dataset_fragpipe_ionquant() function. Basically, the new parser extracts data from the ion.tsv and psm.tsv files that are present since more recent FragPipe versions (previous MS-DAP function was developed against FragPipe version of ~2 years ago).

Importantly, this requires experiment info provided in the FragPipe Workflow tab. Either provide Experiment*Bioreplicate combinations that are unique for each row in the table, or only provide the Experiment but make it unique for each row (as shown in your screenshot should be fine).

I've tested it on 2 datasets so far and validated that loading a FragPipe dataset into MS-DAP and then plotting experiment*peptide intensity values yields the exact same result as directly using the FragPipe peptide-level output file (ergo, our 'fragpipe parser' retrieves all peptides with correct abundance values).

fragpipe_parser_beta__2023-03-05.zip

ionurchin commented 1 year ago

Sorry, it has taken me a while to get back to data analysis! I've just tested your beta version and still I get the information message stating no match between runs file found. I'll let it go through the full pipeline and compare to the previous output. Maybe the info message is not accurate? Here are the commands followed by the beginning of the output:

library(msdap) Loading required package: rlang Loading required package: dplyr Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union Loading required package: tibble Loading required package: tidyr Loading required package: ggplot2

source("/Users/bhampton/Documents/MS-DAP/fragpipe_parser_beta__2023-03-05.R") dataset = msdap::import_dataset_fragpipe_ionquant(path = "/Volumes/EVO-SSD/Proteomics/iPSC/ipsc-50C", acquisition_mode = "dda", collapse_peptide_by = "sequence_modified") info: reading IonQuant data from FragPipe output folder... info: no match-between-runs data found (this file is missing: """) info: Extract peptide retention time and identification confidence from: /Volumes/EVO-SSD/Proteomics/iPSC/ipsc-50C/10mbr-0mbrCorrelation/ko_1_1/psm.tsv info: Extract peptide retention time and identification confidence from: /Volumes/EVO-SSD/Proteomics/iPSC/ipsc-50C/10mbr-0mbrCorrelation/ko_2_2/psm.tsv info: Extract peptide retention time and identification confidence from: /Volumes/EVO-SSD/Proteomics/iPSC/ipsc-50C/10mbr-0mbrCorrelation/ko_3_3/psm.tsv ----snip----

ftwkoopmans commented 1 year ago

I think that's because you are explicitly calling the function from the MS-DAP package, not the temp/beta updated one from the script. Please remove the msdap:: prefix from the line dataset = msdap::import_dataset_fragpipe_ionquant and try again

ionurchin commented 1 year ago

The function throws an error at the beginning of ### check for valid function parameters section. Below is the output. Note I get same error regardless of the presence of valid arguments for the import function. I've pasted in an empty one for brevity:

library(msdap) ... output source("/Users/bhampton/Downloads/fragpipe_parser_beta__2023-03-05.R") import_dataset_fragpipe_ionquant() Error in check_parameter_is_string(path) : could not find function "check_parameter_is_string"

ionurchin commented 1 year ago

Is the beta script that is sourced not working because the location of the functions it specifies are not defined properly? Might they work if the script was moved into the directory where the functions are located? If so where is this?

ftwkoopmans commented 1 year ago

Sorry, I realized after the fact that the beta script only works in our dev environment (assumes the msdap package is loaded into the namespace, thus all its functions are available). My bad.

Meanwhile we've finished testing the new code for importing FragPipe data and tested it against FragPipe v18 and v19 using various LFQ input datasets and FragPipe settings (including misconfigurations of FragPipe so we're sure that MS-DAP will give proper warnings when importing the data).

Thanks again for the feedback, we've just released version 1.0.4 that includes the updated FragPipe import function, with FragPipe documentation and screenshots here.

To upgrade, close (all instances of) RStudio > start RStudio > run the following code snippet;

Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true")
devtools::install_github("ftwkoopmans/msdap", upgrade = "never") # don't update dependencies if not needed

Please check it out and let me know if everything worked out alright !

ionurchin commented 1 year ago

It uses MBR data now and no error messages. Thanks! You have done a great service to the proteomics community by creating and supporting MS-DAP.

ftwkoopmans commented 1 year ago

ok great! I'm trying to keep the updates coming with prio for errors / issues, but hopefully also some new features later this year