RECETOX / RIAssigner

RIAssigner is a python tool for retention index (RI) computation for GC-MS data.
MIT License
4 stars 6 forks source link

Sanitize column names when reading data #98

Closed hechth closed 5 months ago

hechth commented 1 year ago

Similar to how matchms reads data, RIAssigner accepts multiple different column names for the retention time and retention index information, in order to be compatible with what is supported by matchms and to be user friendly.

Mapping multiple column names to a single column should maybe be done more systematically and transparent to make it easier for developers to maintain and for users to use their files.

Original text:

In this file the RI is not found for some bizarre reason - this needs to be investigated. fake_ri_ms_w4e.csv

hechth commented 1 year ago

It was related due to a space - we should implement some column sanitizing functionality.

hechth commented 1 year ago

Ideally using a package like this: https://towardsdatascience.com/how-to-clean-messy-pandas-column-names-20dc7400cea7 or

hechth commented 6 months ago

Another package that would work well is dataprep. This should be used in the pandas data class.