Closed thijss96 closed 2 years ago
Can you please share a small subset of your data that will allow us to reproduce your issue?
Would sending you a few rows of each of the input files suffice?
Yes, as long as those few rows allow us to reproduce the error. @devonjkohler I looked at MSstatsConvert code and the
could not find function "tstrsplit" is more likely coming from the PTM part, not sure about the rest of the error.
Yes, as long as those few rows allow us to reproduce the error. @devonjkohler I looked at MSstatsConvert code and the
could not find function "tstrsplit" is more likely coming from the PTM part, not sure about the rest of the error.
I stumbled across it here, just by chance: https://github.com/Vitek-Lab/MSstatsPTM/commit/3225e3481bc3111e689f17ed76a88c2b364d4816
Hi, just checking to see weahter you're wokring on the issue, or maybe the files I sent didnt suffice?
Hi @thijss96,
Thank you for sending over the files. We have identified the problem and are working on a fix.
One quick question, the annotation file indicates 6 fractions. Are there 6 fractions in both the modified and unmodified runs?
Hi,
Thanks for your reply. No, the phospho run has only 3 fractions. Will this be a problem?
Hi @thijss96,
Definitely not a problem. I was just curious because the setup of the PTM data indicated 3 fractions, as you mentioned.
Devon
Hi @thijss96,
I have implemented and pushed a fix for the MaxQ converter problem. There were two main fixes I added. The first is that I added a unique annotation file for the PTM run for cases like yours where the experimental design is different between the modified and unmodified runs. The second is a naming convention in the columns named Reporter.intensity.count.1.1___1
. All the MaxQ data I have seen has different forms of these columns so the converter needed to account for the different naming forms (ie some were in the form of Reporter.intensity.count.1.TMT1phos___1
or Reporter.intensity.count.1.TMT1___1
). I've just added a couple parameters to specify the unique naming convention in each dataset.
With that being said I have pushed the fixes to both github and Bioconductor. The Bioconductor fix will take a day or two to propagate, so feel free to install the package directly from github in the meantime. Please see the code below on exactly how you can convert your specific data.
Best, Devon
test <- MaxQtoMSstatsPTMFormat(sites.mq,
annotation.ptm,
evidence = mq.evid,
proteinGroups = mq.pg,
annotation.prot = annotation.mq,
mod.num = 'Single',
TMT.keyword = "", ## specify first part of TMT1phos naming convention
ptm.keyword = "") ## specify second part of TMT1phos naming convention
Hi @devonjkohler
This is great! Thanks for the fix. I will get going with it after my holidays next week and will keep you posted on the progress, if you're interested.
Cheers, Thijs
Hi @devonjkohler Should the keywords be presents in the cahnnel names? I get another error now:
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 468288 rows; more than 234240 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
Might this be because of identical channel names between the phospho-data and global data?
Too bad this is in the end closed. I would still be curious to use MSstatsPTM for my dataset.
Hi,
I keep running into errors during conversion of my data into MSstatsPTM. So far I seemed to have solved them, but this one I'm really stuck on:
Error in tstrsplit(PeptideSequence, ":", keep = 1) : could not find function "tstrsplit"
This is a function from
data.frame
I presume, so when I call this package and rerun theMaxQtoMSstatsPTMFormat
I get this:Error in tstrsplit(PeptideSequence, ":", keep = 1) : 'keep' should contain integer values between 0 and 0.
This is the code I am running:
Load MaxQuant output (evicence and proteingroups from abundancy proteomics)
Convert data to MSstatsPTM: