cbielow / PTXQC

A Quality Control (QC) pipeline for Proteomics (PTX) results generated by MaxQuant
Other
42 stars 25 forks source link

report genration fail with chinese language settings #18

Closed ParisWu closed 8 years ago

ParisWu commented 8 years ago

Hi, I have some problem in generation report.

the error said: "Error: file 'E:/Ping/沈华浩/201601-reviewed/combined/txt/evidence.txt' seems to have been edited in Microsoft Excel and has artificial line-breaks which destroy the data at lines (roughly):1 Please fix (e.g. try LibreOffice 4.0.x or above)!"

I have no microsoft office in my computer. and R version is 3.2.3 maxquant version is 1.5.1.0 Pls help me!

library("PTXQC") Loading package PTXQC (version 0.70.1) require("PTXQC") txt_folder = "E:\Ping\沈华浩\201601-reviewed\combined\txt" 错误: 由""E:\P"开头的字符串中存在'\P',但没有这种逸出号 txt_folder = "E:\Ping\沈华浩\201601-reviewed\combined\txt" 错误: 由""E:\P"开头的字符串中存在'\P',但没有这种逸出号 txt_folder = "E:/Ping/沈华浩/201601-reviewed/combined/txt" r = createReport(txt_folder) Reading file E:/Ping/沈华浩/201601-reviewed/combined/txt/parameters.txt ... Read 59 entries from E:/Ping/沈华浩/201601-reviewed/combined/txt/parameters.txt. Updating colnames Simplifying contaminants Simplifying reverse Reading file E:/Ping/沈华浩/201601-reviewed/combined/txt/summary.txt ... Read 81 entries from E:/Ping/沈华浩/201601-reviewed/combined/txt/summary.txt. Updating colnames Simplifying contaminants Simplifying reverse Adding fc.raw.file column ... done Reading file E:/Ping/沈华浩/201601-reviewed/combined/txt/proteinGroups.txt ... Read 1020 entries from E:/Ping/沈华浩/201601-reviewed/combined/txt/proteinGroups.txt. Updating colnames Simplifying contaminants Simplifying reverse Reading file E:/Ping/沈华浩/201601-reviewed/combined/txt/evidence.txt ... WARNING: Could not find column regex '^fraction$' using case-INsensitive matching. WARNING: Could not find column regex '[RK].Count' using case-INsensitive matching. WARNING: Could not find column regex '^protein.names$' using case-INsensitive matching. Keeping 25 of 60 columns! Read 257884 entries from E:/Ping/沈华浩/201601-reviewed/combined/txt/evidence.txt. [1] "While checking ID column: last ID was 'NA', while table has '257884' rows." Error in get(x, envir = this, inherits = inh)(this, ...) :

Error: file 'E:/Ping/沈华浩/201601-reviewed/combined/txt/evidence.txt' seems to have been edited in Microsoft Excel and has artificial line-breaks which destroy the data at lines (roughly): 1 Please fix (e.g. try LibreOffice 4.0.x or above)!

cbielow commented 8 years ago

Did you in any way open the evidence.txt after it was generated by MaxQuant?!

PTXQC has detected some data damage, now the question is, where does it come from. Most commonly, users open the file in Excel (hence the warning you see) -- and Excel might break the file (the larger the file, the more likely it breaks).

If you did not touch the evidence.txt, could you please upload it somewhere and send me the download link (my email adress is in the DESCRIPTION file). Thanks!

cbielow commented 8 years ago

The reason for failure was that MaxQuant writes a chinese representation for NaN (非数字) into its output files (here, evidence.txt) and stores them in UTF-8 encoding. Since we assume data to be ASCII and not UTF-8, this lead to nasty errors (mostly numerical columns which could not be converted).

This is fixed in v0.70.2.