cbielow / PTXQC

A Quality Control (QC) pipeline for Proteomics (PTX) results generated by MaxQuant
Other
42 stars 25 forks source link

Maxquant LFQ quantitation result processing error #14

Closed ParisWu closed 8 years ago

ParisWu commented 8 years ago

Hi, Thanks for helping me! I used maxquant (version: 1.5.1.0)LFQ (label free quant)to quant 40 single phase proteome profiling project. when I use PTXQC in R (i386 3.2.0), it report warning and error like follows, please help me! Thanks a lot!

"> txt_folder = "D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt"

r = createReport(txt_folder) Reading file D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/parameters.txt ... Read 59 entries from D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/parameters.txt. Updating colnames Simplifying contaminants Simplifying reverse Reading file D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/summary.txt ... Read 81 entries from D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/summary.txt. Updating colnames Simplifying contaminants Simplifying reverse Adding fc.raw.file column ... done Reading file D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/proteinGroups.txt ... Read 1183 entries from D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/proteinGroups.txt. Updating colnames Simplifying contaminants Simplifying reverse Reading file D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/evidence.txt ... WARNING: Could not find column regex '^fraction$' using case-INsensitive matching. WARNING: Could not find column regex '[RK].Count' using case-INsensitive matching. WARNING: Could not find column regex '^protein.names$' using case-INsensitive matching. Keeping 25 of 60 columns! Read 254391 entries from D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/evidence.txt. [1] "While checking ID column: last ID was 'NA', while table has '254391' rows." Error in get(x, envir = this, inherits = inh)(this, ...) : Error: file 'D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/evidence.txt' seems to have been edited in Microsoft Excel and has artificial line-breaks which destroy the data at lines (roughly): 1 Please fix (e.g. try LibreOffice 4.0.x or above)!"

cbielow commented 8 years ago

Have a look at the error message in the last line: Error: file 'D:/R/mydirectory/Shenghuahao/maxquant-result/combined/txt/evidence.txt' seems to have been edited in Microsoft Excel and has artificial line-breaks which destroy the data at lines (roughly): 1

So the question is: did you open the evidence.txt with Excel some other editor (and saved it afterwards?) If yes, Excel broke your file and it is 'hard' to reconstruct automatically. One way to fix this is to manually edit the file and remove lines which look fishy -- the other (more robust way) is to run partial processing in MaxQuant and re-write the tables (last step). This should be rather quick to do and fixes your txt files. After that, just run PTXQC again.

ParisWu commented 8 years ago

Thanks a lot! I will try!

cbielow commented 8 years ago

I'll regard this as fixed. Please add a comment, if you still encounter problems.

ParisWu commented 8 years ago

Hi, Could tell me how to manually edit the file and remove lines which look fishy? Or, could please tell me how run patial processing of re-writting tables? when I run this, there comes a ERROR, said "can not find MSMS scan.txt", but I did check the msms scan in the parameter. Thanks!

cbielow commented 8 years ago

Hi

Hi, Could tell me how to manually edit the file and remove lines which look fishy?

If you open the file with Excel or LibreOffice or any other editor, you will find that some lines start with a list of numbers, e.g. '5785;6300;6798;...' whereas most other lines start with a peptide sequence (or whatever the first column in that file supposed to be. For evidence.txt its a peptide sequence). These faulty lines usually originate from a manual editing of the file with MS Excel, because Excel has a maximum limit on the amount of text a single cell can hold (something like 32,000 characters). Sometimes the list of reference spectra for a single peptide is longer than 32k characters and Excel will then truncate the content and write the remaining characters into the first row (A) on the next line. All cells after that will also be appended to this new line.

So, if you are lucky and the line-break happended in a column which PTXQC does not need and all subsequent columns are also not required, then you can simply delete these new lines which contain the partial data. The line immediately above will be incomplete, but that hopefully does not hurt (otherwise you have to delete that as well). Since this is a manual process and a few hundred lines can be affected, this is very tedious and error prone. So rewriting the txt-tables by running partial processing in MaxQuant is the better option (see below).

Or, could please tell me how run patial processing of re-writting tables?

Just open your project (the mqpar.xml) again in MaxQuant and look for the "partial processing" button (next to the "start" button). Just run the last step ("Write-tables"), which will fix your txt files. Once that is done, you can run PTXQC again. Just do not mess with the txt files with Excel in the mean time :)

when I run this, there comes a ERROR, said "can not find MSMS scan.txt", but I did check the msms scan in the parameter. Thanks!

You mean when running PTXQC?! Seems that I'm missing some information here, since your original post mentions a PTXQC error during evidence.txt. Did you fix this already somehow? Because MSMSscan.txt is processed after that in PTXQC. Its somewhat hard to answer your question without more details. Can you post the full text of the PTXQC command window. Thanks.