ISA-tools / isa-api

ISA tools API
https://isa-tools.org
Other
42 stars 38 forks source link

magetab2isatab splitting of table sometimes outputs empty cells #222

Closed djcomlab closed 7 years ago

djcomlab commented 7 years ago

magetab2isatab splitting of table sometimes outputs empty cells, noticed on test test_get_experiment_as_isatab_afmx_2() that splits the E-AFMX-2 SDRF file.

Possible a problem with reading the SDRF file into the initial DataFrame using pandas.

djcomlab commented 7 years ago

I have traced the problem down to the line:

df = pd.read_csv(in_filename, sep='\t', comment='#').fillna('')

that loads tables into a DataFrame using Pandas. The comment handling doesn't just detect the start of a line with # as being a comment, but also detects it anywhere within the line. Consequently with E-AFMX-2, the SDRF file contains URLs formatted with a # in the URL, e.g. http://purl.org/obo/owl/NCBITaxon#NCBITaxon_9606 as a value for the Characteristic[Organism] Term Accession Number.