Tarskin / HappyTools

A tool for the (high-throughput) processing of HPLC data.
Apache License 2.0
34 stars 16 forks source link

Thousand and decimal seperator formatting is hardcoded #1

Closed Tarskin closed 7 years ago

Tarskin commented 7 years ago

This leads to the case where HappyTools can open data that is encoded in the US style, but needs a lot of messy logic to deal with each country. The program must be swapped to use a method that works regardless of what country the data is taken from (specifically the numerical formatting).

Tarskin commented 7 years ago

This bug was addressed by implementing a regular expression based thousand and decimal seperator method. The code is as follows:

                lineChunks = line.strip().split()
                # Number based regex splitting to get rid of thousand seperators
                timeSep = re.sub(r'-?\d', '', lineChunks[0], flags=re.U)
                for sep in timeSep[:-1]:
                    lineChunks[0] = lineChunks[0].replace(sep, '')
                if timeSep:
                    lineChunks[0] = lineChunks[0].replace(timeSep[-1], '.')
                intSep = re.sub(r'-?\d', '', lineChunks[-1], flags=re.U)
                for sep in intSep[:-1]:
                    lineChunks[-1] = lineChunks[-1].replace(sep[-1], '')
                if intSep:
                    lineChunks[-1] = lineChunks[-1].replace(intSep[-1], '.')
                # End of regex based splitting
                try:
                    chromData.append((float(lineChunks[0]),float(lineChunks[-1])))
                except UnicodeEncodeError:
                    print "Omitting line: "+str(line)