Vitek-Lab / MSstatsPTM

Post Translational Modification (PTM) Significance Analysis in shotgun mass spectrometry-based proteomic experiments
https://vitek-lab.github.io/MSstatsPTM/
Artistic License 2.0
8 stars 2 forks source link

Error during converting data from MaxQuant #27

Closed thijss96 closed 2 years ago

thijss96 commented 2 years ago

Hi,

I keep running into errors during conversion of my data into MSstatsPTM. So far I seemed to have solved them, but this one I'm really stuck on:

Error in tstrsplit(PeptideSequence, ":", keep = 1) : could not find function "tstrsplit"

This is a function from data.frame I presume, so when I call this package and rerun the MaxQtoMSstatsPTMFormat I get this:

Error in tstrsplit(PeptideSequence, ":", keep = 1) : 'keep' should contain integer values between 0 and 0.

This is the code I am running:

Load MaxQuant output (evicence and proteingroups from abundancy proteomics)

mq.evid <- read.table("../raw_data/MQ_output/txt_P1462_Fullproteome_LUMOS/evidence.txt", sep="\t", header = T)
mq.pg <- read.table("../raw_data/MQ_output/txt_P1462_Fullproteome_LUMOS/ProteinGroups.txt", sep="\t", header = T)
annotation.mq <- read.table("../raw_data/MQ_output/L_MQannotationMSstats.txt", sep = "\t", header = T)
sites.mq <- read.table("../raw_data/MQ_output/txt_P1462_Phospho_LUMOS/Phospho (STY)Sites.txt", sep = "\t", header = T, fill = T)

Convert data to MSstatsPTM:

  MaxQtoMSstatsPTMFormat(
      sites.mq,
      annotation.mq,
      mq.evid,
      mq.pg,
)
mstaniak commented 2 years ago

Can you please share a small subset of your data that will allow us to reproduce your issue?

thijss96 commented 2 years ago

Would sending you a few rows of each of the input files suffice?

mstaniak commented 2 years ago

Yes, as long as those few rows allow us to reproduce the error. @devonjkohler I looked at MSstatsConvert code and the

could not find function "tstrsplit" is more likely coming from the PTM part, not sure about the rest of the error.

thijss96 commented 2 years ago

Yes, as long as those few rows allow us to reproduce the error. @devonjkohler I looked at MSstatsConvert code and the

could not find function "tstrsplit" is more likely coming from the PTM part, not sure about the rest of the error.

I stumbled across it here, just by chance: https://github.com/Vitek-Lab/MSstatsPTM/commit/3225e3481bc3111e689f17ed76a88c2b364d4816

thijss96 commented 2 years ago

Hi, just checking to see weahter you're wokring on the issue, or maybe the files I sent didnt suffice?

devonjkohler commented 2 years ago

Hi @thijss96,

Thank you for sending over the files. We have identified the problem and are working on a fix.

One quick question, the annotation file indicates 6 fractions. Are there 6 fractions in both the modified and unmodified runs?

thijss96 commented 2 years ago

Hi,

Thanks for your reply. No, the phospho run has only 3 fractions. Will this be a problem?

devonjkohler commented 2 years ago

Hi @thijss96,

Definitely not a problem. I was just curious because the setup of the PTM data indicated 3 fractions, as you mentioned.

Devon

devonjkohler commented 2 years ago

Hi @thijss96,

I have implemented and pushed a fix for the MaxQ converter problem. There were two main fixes I added. The first is that I added a unique annotation file for the PTM run for cases like yours where the experimental design is different between the modified and unmodified runs. The second is a naming convention in the columns named Reporter.intensity.count.1.1___1. All the MaxQ data I have seen has different forms of these columns so the converter needed to account for the different naming forms (ie some were in the form of Reporter.intensity.count.1.TMT1phos___1 or Reporter.intensity.count.1.TMT1___1). I've just added a couple parameters to specify the unique naming convention in each dataset.

With that being said I have pushed the fixes to both github and Bioconductor. The Bioconductor fix will take a day or two to propagate, so feel free to install the package directly from github in the meantime. Please see the code below on exactly how you can convert your specific data.

Best, Devon

test <- MaxQtoMSstatsPTMFormat(sites.mq,
                        annotation.ptm,
                        evidence = mq.evid,
                        proteinGroups = mq.pg,
                       annotation.prot = annotation.mq,
                       mod.num = 'Single',
                       TMT.keyword = "", ## specify first part of TMT1phos naming convention
                       ptm.keyword = "") ## specify second part of TMT1phos naming convention
thijss96 commented 2 years ago

Hi @devonjkohler

This is great! Thanks for the fix. I will get going with it after my holidays next week and will keep you posted on the progress, if you're interested.

Cheers, Thijs

thijss96 commented 2 years ago

Hi @devonjkohler Should the keywords be presents in the cahnnel names? I get another error now:

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 468288 rows; more than 234240 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.

Might this be because of identical channel names between the phospho-data and global data?

thijss96 commented 2 years ago

Too bad this is in the end closed. I would still be curious to use MSstatsPTM for my dataset.