Closed BenSamy2020 closed 3 years ago
Hi Ben,
Your layout is OK. But if you want to use MSstats afterward, you might need to give replicate
1, 2, 3, 4, 5.....
Here (https://github.com/Nesvilab/MSFragger/blob/master/tutorial_fragpipe.md#multi-experiment-report) is an explanation about it. And here (https://github.com/Nesvilab/FragPipe/issues/183) is the discussion.
Best,
Fengchao
Greetings FengChao,
Thank you for your prompt reply. I will take your suggestion into account. Apart from the above file labelling format provided, I was also interested to try out another file format labelling. The format of the labelling is as following:
Path | Experiment | Replicate |
---|---|---|
Patient_1_Day_0_Technical_Replicate_1 | Exp | 1 |
Patient_1_Day_0_Technical_Replicate_2 | Exp | 2 |
Patient_1_Day_0_Technical_Replicate_3 | Exp | 3 |
Patient_1_Day_1_Technical_Replicate_1 | Exp | 4 |
Patient_1_Day_1_Technical_Replicate_2 | Exp | 5 |
Patient_1_Day_1_Technical_Replicate_3 | Exp | 6 |
Patient_1_Day_2_Technical_Replicate_1 | Exp | 7 |
Patient_1_Day_2_Technical_Replicate_2 | Exp | 8 |
Patient_1_Day_2_Technical_Replicate_3 | Exp | 9 |
When I compared the total proteome depth of both search file labelling format, the second option of file labeling provided in this message gave an increased protein identification count. Can I utilize this file labeling format to search my file and perform MS1 quantification? (since FragPipe will provide the abundance of quantified proteins across different file, I can still perform differential protein abundance analysis)
Regards, Ben
Hi Ben,
Yes, you can. But I don't understand why it resulted in more identified proteins. It should not affect the identification result since both format put each run into separated folders.
Best,
Fengchao
Hi FengChao,
Apologies, after looking through my log, I realized that I was searching 2 different set of files. Hence, the difference.
Regards, Ben
Greetings @fcyu,
My experimental design for a different study is slightly more complicated and have trouble deciding on the labelling. Currently, I have 2 cell line, 3 biological repeat for each cell line and 3 fractions for each biological repeat. My file labeling is as following (I will be using MSStats for downstream analysis) (please do advise me if the labeling is appropriate):
Path | Experiment | Replicate |
---|---|---|
SUM159PT_B1_Fraction_1 | SUM159PT_B1 | 1 |
SUM159PT_B1_Fraction_2 | SUM159PT_B1 | 1 |
SUM159PT_B1_Fraction_3 | SUM159PT_B1 | 1 |
SUM159PT_B2_Fraction_1 | SUM159PT_B2 | 2 |
SUM159PT_B2_Fraction_2 | SUM159PT_B2 | 2 |
SUM159PT_B2_Fraction_3 | SUM159PT_B2 | 2 |
SUM159PT_B3_Fraction_1 | SUM159PT_B3 | 3 |
SUM159PT_B3_Fraction_2 | SUM159PT_B3 | 3 |
SUM159PT_B3_Fraction_3 | SUM159PT_B3 | 3 |
HS578T_B1_Fraction_1 | HS578T_B1 | 4 |
HS578T_B1_Fraction_2 | HS578T_B1 | 4 |
HS578T_B1_Fraction_3 | HS578T_B1 | 4 |
HS578T_B2_Fraction_1 | HS578T_B2 | 5 |
HS578T_B2_Fraction_2 | HS578T_B2 | 5 |
HS578T_B2_Fraction_3 | HS578T_B2 | 5 |
HS578T_B3_Fraction_1 | HS578T_B3 | 6 |
HS578T_B3_Fraction_2 | HS578T_B3 | 6 |
HS578T_B3_Fraction_3 | HS578T_B3 | 6 |
Thank you, Ben
Hi Ben,
Your labelling looks good.
Best,
Fengchao
Thank you brother.
Greetings @fcyu,
I have successfully performed the search (Default_MBR). Currently, I am trying to process the fragpipe output MSstats file using MSstats. I am consistently experience the error of:
"Error in dataProcess(raw, logTrans = 10) : MSstats suspects that there are fractionations and potentially technical replicates too. Please add Fraction column in the input."**
I am not very sure, but I suspect that the MSstats file is lacking a fraction column. I have provided my MSstat file below for your reference. Please do advise me on how I should proceed?
Regards, Ben. MSstats.zip
Hmm, interesting. It looks like there is a "hidden" column not documented in either https://www.bioconductor.org/packages/release/bioc/manuals/MSstats/man/MSstats.pdf or https://www.bioconductor.org/packages/release/bioc/vignettes/MSstats/inst/doc/MSstats.html.
After some digging, there is an related issue (https://github.com/RobertsLab/resources/issues/516#issuecomment-449133673). I tested by adding a Fraction
column with raw$Fraction <- 1
(but you need to specify different numbers if your sample has different fractions), it works. For now, could you add that column by yourself? We will fix the output in the next release.
Best,
Fengchao
Greetings @fcyu,
Thank you for your rapid reply. Unfortunately, excel consistently hangs when I perform the changes due to the sheer amount of data on the csv file.
By any chance, will I be able to know when (~approximate) you will be releasing the next release? (I am sorry for being very "pressy"). I have collaborators waiting for the analyzed results.
Regards Ben.
Hi Ben,
You can easily add it in R with the command like raw$Fraction <- 1
. If your data had different fractions, it would be a little bit more complicated, but I am sure that it is feasible.
Best,
Fengchao
Greetings @fcyu,
For the representative file labeling format table that I sent you in this thread previously (table below for your reference), the group comparison matrix gets very tricky. I am currently comparing 10 cell lines, each cell lines with 3 biological repeat and each biological repeat has 3 fractions (I managed to incorporate a fraction column in the MSstats.csv file) (refer to MSStats QC plot for study file labels below).
Path | Experiment | Replicate |
---|---|---|
SUM159PT_B1_Fraction_1 | SUM159PT_B1 | 1 |
SUM159PT_B1_Fraction_2 | SUM159PT_B1 | 1 |
SUM159PT_B1_Fraction_3 | SUM159PT_B1 | 1 |
SUM159PT_B2_Fraction_1 | SUM159PT_B2 | 2 |
SUM159PT_B2_Fraction_2 | SUM159PT_B2 | 2 |
SUM159PT_B2_Fraction_3 | SUM159PT_B2 | 2 |
SUM159PT_B3_Fraction_1 | SUM159PT_B3 | 3 |
SUM159PT_B3_Fraction_2 | SUM159PT_B3 | 3 |
SUM159PT_B3_Fraction_3 | SUM159PT_B3 | 3 |
HS578T_B1_Fraction_1 | HS578T_B1 | 4 |
HS578T_B1_Fraction_2 | HS578T_B1 | 4 |
HS578T_B1_Fraction_3 | HS578T_B1 | 4 |
HS578T_B2_Fraction_1 | HS578T_B2 | 5 |
HS578T_B2_Fraction_2 | HS578T_B2 | 5 |
HS578T_B2_Fraction_3 | HS578T_B2 | 5 |
HS578T_B3_Fraction_1 | HS578T_B3 | 6 |
HS578T_B3_Fraction_2 | HS578T_B3 | 6 |
HS578T_B3_Fraction_3 | HS578T_B3 | 6 |
For example, I will have to write (comparison of 2 cell lines): comparison1 <- matrix(c(0.333,0.333,0.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.333,-0.333,-0.333)
Collectively, analysis with MSstats gets complicated and throws multiple errors. To overcome these issues, I relabeled (removed _B1, _B2, _B3) my FragPipe workflow tab file to:
Path | Experiment | Replicate |
---|---|---|
SUM159PT_B1_Fraction_1 | SUM159PT | 1 |
SUM159PT_B1_Fraction_2 | SUM159PT | 1 |
SUM159PT_B1_Fraction_3 | SUM159PT | 1 |
SUM159PT_B2_Fraction_1 | SUM159PT | 2 |
SUM159PT_B2_Fraction_2 | SUM159PT | 2 |
SUM159PT_B2_Fraction_3 | SUM159PT | 2 |
SUM159PT_B3_Fraction_1 | SUM159PT | 3 |
SUM159PT_B3_Fraction_2 | SUM159PT | 3 |
SUM159PT_B3_Fraction_3 | SUM159PT | 3 |
HS578T_B1_Fraction_1 | HS578T | 4 |
HS578T_B1_Fraction_2 | HS578T | 4 |
HS578T_B1_Fraction_3 | HS578T | 4 |
HS578T_B2_Fraction_1 | HS578T | 5 |
HS578T_B2_Fraction_2 | HS578T | 5 |
HS578T_B2_Fraction_3 | HS578T | 5 |
HS578T_B3_Fraction_1 | HS578T | 6 |
HS578T_B3_Fraction_2 | HS578T | 6 |
HS578T_B3_Fraction_3 | HS578T | 6 |
Based on your experience, do you think this above labelling is appropriate for MSstats and its downstream analysis? I lastly would like to apologize, I understand that MSstats is not your managed tool. It would be nice if I could get your input on how I can proceed with MSstats analysis.
Regards Ben
Please email the msstats team, I am sure they will be happy to advise you on your experimental design
Sent from my iPhone
On Jun 2, 2021, at 4:46 AM, BenSamy2020 @.***> wrote:
External Email - Use Caution
Greetings @fcyuhttps://github.com/fcyu,
For the representative file labeling format table that I sent you in this thread previously (table below for your reference), the group comparison matrix gets very tricky. I am currently comparing 10 cell lines, each cell lines with 3 biological repeat and each biological repeat has 3 fractions (I managed to incorporate a fraction column in the MSstats.csv file) (refer to MSStats QC plot for study file labels below).
Path Experiment Replicate SUM159PT_B1_Fraction_1 SUM159PT_B1 1 SUM159PT_B1_Fraction_2 SUM159PT_B1 1 SUM159PT_B1_Fraction_3 SUM159PT_B1 1 SUM159PT_B2_Fraction_1 SUM159PT_B2 2 SUM159PT_B2_Fraction_2 SUM159PT_B2 2 SUM159PT_B2_Fraction_3 SUM159PT_B2 2 SUM159PT_B3_Fraction_1 SUM159PT_B3 3 SUM159PT_B3_Fraction_2 SUM159PT_B3 3 SUM159PT_B3_Fraction_3 SUM159PT_B3 3 HS578T_B1_Fraction_1 HS578T_B1 4 HS578T_B1_Fraction_2 HS578T_B1 4 HS578T_B1_Fraction_3 HS578T_B1 4 HS578T_B2_Fraction_1 HS578T_B2 5 HS578T_B2_Fraction_2 HS578T_B2 5 HS578T_B2_Fraction_3 HS578T_B2 5 HS578T_B3_Fraction_1 HS578T_B3 6 HS578T_B3_Fraction_2 HS578T_B3 6 HS578T_B3_Fraction_3 HS578T_B3 6
For example, I will have to write (comparison of 2 cell lines): comparison1 <- matrix(c(0.333,0.333,0.333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.333,-0.333,-0.333)
Collectively, analysis with MSstats gets complicated and throws multiple errors. To overcome these issues, I relabeled my FragPipe workflow tab file to:
Path Experiment Replicate SUM159PT_B1_Fraction_1 SUM159PT 1 SUM159PT_B1_Fraction_2 SUM159PT 1 SUM159PT_B1_Fraction_3 SUM159PT 1 SUM159PT_B2_Fraction_1 SUM159PT 2 SUM159PT_B2_Fraction_2 SUM159PT 2 SUM159PT_B2_Fraction_3 SUM159PT 2 SUM159PT_B3_Fraction_1 SUM159PT 3 SUM159PT_B3_Fraction_2 SUM159PT 3 SUM159PT_B3_Fraction_3 SUM159PT 3 HS578T_B1_Fraction_1 HS578T 4 HS578T_B1_Fraction_2 HS578T 4 HS578T_B1_Fraction_3 HS578T 4 HS578T_B2_Fraction_1 HS578T 5 HS578T_B2_Fraction_2 HS578T 5 HS578T_B2_Fraction_3 HS578T 5 HS578T_B3_Fraction_1 HS578T 6 HS578T_B3_Fraction_2 HS578T 6 HS578T_B3_Fraction_3 HS578T 6
Based on your experience, do you think this above labelling is appropriate for MSstats and its downstream analysis? I lastly would like to apologize, I understand that MSstats is not your managed tool. It would be nice if I could get your input on how I can proceed with MSstats analysis.
Regards Ben
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/363#issuecomment-852844639, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZFBNDJLMRYDVXAOKTTQXVVRANCNFSM43YKIU5Q.
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Hi Ben,
I am not sure about this one. As Alexey pointed out, it would be better to ask MSStats team.
Best,
Fengchao
Greetings @fcyu,
My experimental design for a different study is slightly more complicated and have trouble deciding on the labelling. Currently, I have 2 cell line, 3 biological repeat for each cell line and 3 fractions for each biological repeat. My file labeling is as following (I will be using MSStats for downstream analysis) (please do advise me if the labeling is appropriate):
Path Experiment Replicate SUM159PT_B1_Fraction_1 SUM159PT_B1 1 SUM159PT_B1_Fraction_2 SUM159PT_B1 1 SUM159PT_B1_Fraction_3 SUM159PT_B1 1 SUM159PT_B2_Fraction_1 SUM159PT_B2 2 SUM159PT_B2_Fraction_2 SUM159PT_B2 2 SUM159PT_B2_Fraction_3 SUM159PT_B2 2 SUM159PT_B3_Fraction_1 SUM159PT_B3 3 SUM159PT_B3_Fraction_2 SUM159PT_B3 3 SUM159PT_B3_Fraction_3 SUM159PT_B3 3 HS578T_B1_Fraction_1 HS578T_B1 4 HS578T_B1_Fraction_2 HS578T_B1 4 HS578T_B1_Fraction_3 HS578T_B1 4 HS578T_B2_Fraction_1 HS578T_B2 5 HS578T_B2_Fraction_2 HS578T_B2 5 HS578T_B2_Fraction_3 HS578T_B2 5 HS578T_B3_Fraction_1 HS578T_B3 6 HS578T_B3_Fraction_2 HS578T_B3 6 HS578T_B3_Fraction_3 HS578T_B3 6
Thank you, Ben
For the quoted text, I think the experiment column should have same value for all the biological replicates (for use in MSStats). Experiment(Fragpipe) = Condition (MSStats)
Greetings,
Currently, I am searching patient derived serum sample data using FragPipe (15.1 Built 4). I would like to perform MS1 quantification. Briefly, my experimental design is as following: From a single patient, blood was collected across 3 days (Day 0, 1, 2). Collected blood was processed and peptides were analyzed with 3 technical repeats for each blood collection day sample. I have a total of 12 .raw files to be searched.
My file labelling layout is as following:
Collectively, is the labelling of the file appropriate for my study design and requirement?
Regards, Ben