Closed djcomlab closed 7 years ago
I have traced the problem down to the line:
df = pd.read_csv(in_filename, sep='\t', comment='#').fillna('')
that loads tables into a DataFrame using Pandas. The comment handling doesn't just detect the start of a line with #
as being a comment, but also detects it anywhere within the line. Consequently with E-AFMX-2, the SDRF file contains URLs formatted with a #
in the URL, e.g. http://purl.org/obo/owl/NCBITaxon#NCBITaxon_9606
as a value for the Characteristic[Organism]
Term Accession Number.
magetab2isatab
splitting of table sometimes outputs empty cells, noticed on testtest_get_experiment_as_isatab_afmx_2()
that splits the E-AFMX-2 SDRF file.Possible a problem with reading the SDRF file into the initial DataFrame using pandas.