Closed Tarskin closed 7 years ago
This bug was addressed by implementing a regular expression based thousand and decimal seperator method. The code is as follows:
lineChunks = line.strip().split()
# Number based regex splitting to get rid of thousand seperators
timeSep = re.sub(r'-?\d', '', lineChunks[0], flags=re.U)
for sep in timeSep[:-1]:
lineChunks[0] = lineChunks[0].replace(sep, '')
if timeSep:
lineChunks[0] = lineChunks[0].replace(timeSep[-1], '.')
intSep = re.sub(r'-?\d', '', lineChunks[-1], flags=re.U)
for sep in intSep[:-1]:
lineChunks[-1] = lineChunks[-1].replace(sep[-1], '')
if intSep:
lineChunks[-1] = lineChunks[-1].replace(intSep[-1], '.')
# End of regex based splitting
try:
chromData.append((float(lineChunks[0]),float(lineChunks[-1])))
except UnicodeEncodeError:
print "Omitting line: "+str(line)
This leads to the case where HappyTools can open data that is encoded in the US style, but needs a lot of messy logic to deal with each country. The program must be swapped to use a method that works regardless of what country the data is taken from (specifically the numerical formatting).