MassBank / MassBank-data

Official repository of open data MassBank records
68 stars 55 forks source link

Add records of University of Antwerp #180

Closed tsufz closed 2 years ago

tsufz commented 2 years ago

Not yet happy. Need to check the code again because the range is rounded up and thus loss of information possible.

tsufz commented 2 years ago

We have still the problem on the upper limit with the typical rounding issues. @meier-rene do you have any idea how to improve the code? Both first records show the issue.

#!/bin/bash
cd ./Antwerp_Univ

for file in *.txt; do
awk 'BEGIN {FS="[- ]"}{if($2=="MASS_RANGE_M/Z") printf "%s %s %d-%.0f\n", $1, $2, $3, $4; else print}' "$file" | sponge "$file";
done
tsufz commented 2 years ago

@schymane, @meowcat, and @sneumann, we should clarify how we handle rounding to unit mass (and fix it in the Record Format as guidance and to avoid confusion). For the lower limit (of mass range), the integer works, but on the upper limit, we could either have up rounding (from .5) or down rounding (up to .49) depending on the algorithm. Any opinions?

The code behaves like my description (round down to integer at the lower limit, and either down or up at the upper limit).

tsufz commented 2 years ago

Well, let's do the round down on the lower limit and the round up on the upper limit. @pstahlhofen will implement in RMassBank. I will update the Record Format accordingly. https://github.com/MassBank/MassBank-web/issues/309

Just to keep the code updated:

!/bin/bash

cd ./Antwerp_Univ

for file in *.txt; do awk 'BEGIN {FS="[- ]"}{if($2=="MASS_RANGE_M/Z") printf "%s %s %d-%d\n", $1, $2, $3, $4+1; else print}' "$file" | sponge "$file"; done

pstahlhofen commented 2 years ago

I just implemented and merged that exact rounding behavior in #295 today