Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
205 stars 38 forks source link

PercolatorOutputToPepXML writes decimal comma in XML file, depending on the locale/regional settings #432

Closed phusen closed 3 years ago

phusen commented 3 years ago

My fragpipe run failed with the error:

PhilosopherFilter [Work dir: /home/phusen/prg/UPS_FragPipe/output/12500amol_1]
/home/phusen/SW/fragpipe/tools/philosopher/philosopher filter --sequential --razor --prot 0.01 --tag rev_ --pepxml /home/phusen/prg/UPS_FragPipe/output/12500amol_1 --protxml /home/phusen/prg/UPS_FragPipe/output/combined.prot.xml --razorbin /home/phusen/prg/UPS_FragPipe/output/12500amol_1/.meta/razor.bin
Process 'PhilosopherFilter' finished, exit code: 1
Process returned non-zero exit code, stopping
time="14:47:49" level=info msg="Executing Filter  v4.0.0"
time="14:47:49" level=info msg="Processing peptide identification files"
time="14:47:49" level=fatal msg="Cannot decode packed binary. strconv.ParseFloat: parsing \"0,104952\": invalid syntax"

So apparently something is writing "localized" floating point values using decimal commas, since my regional settings were set to Danish, in an XML file, which philosopher then later fails to parse. It works, if I run fragpipe or the specific command

java -cp /home/phusen/SW/fragpipe/lib/* com.dmtavt.fragpipe.tools.percolator.PercolatorOutputToPepXML UPS1_2500amol_R3.pin UPS1_2500amol_R3 UPS1_2500amol_R3_percolator_target_psms.tsv UPS1_2500amol_R3_percolator_decoy_psms.tsv interact-UPS1_2500amol_R3 DDA

using LANG=C, but I think it should ideally work no matter the regional settings. It appears to happen in the percolatorToPepXML method in PercolatorOutputToPepXML.java, which inserts extra data into an existing pepXML file using String.format (i.e. not using an XML library). The existing file has decimal dots, so the result is a mix and also has comma separated lists of floating point values with commas. So maybe it would be best to make sure that String.format is using the standard US locale?

log_2021-08-16_15-12-41.txt

fcyu commented 3 years ago

This is from the same reason of issue https://github.com/Nesvilab/FragPipe/issues/415.

Guo Ci @guoci Can you add Locale.setDefault(Locale.US); to PercolatorOutputToPepXML.java?

Thanks,

Fengchao

guoci commented 3 years ago

@phusen thanks for the bug report, you can find the fix in the link: https://drive.google.com/drive/folders/1yLxJmXjN8tHpOos_eFRRfz4WjlMoAtV7?usp=sharing

phusen commented 3 years ago

Wow, thanks for the quick effort. I can confirm that it works now (in the build you provided) without setting LANG=C.

fcyu commented 3 years ago

You are welcome. Please feel free to contact us if you have any further questions.

Best,

Fengchao