VariantEffect / mavedb-api

MaveDB API
GNU Affero General Public License v3.0
8 stars 2 forks source link

Exported CSV files cannot be imported. #205

Closed jstone-dev closed 4 months ago

jstone-dev commented 4 months ago

The CSV formats for exporting and importing are inconsistent. In exported files, the accession column contains unquoted # characters. During import, comments beginning with # are allowed, so it looks like lines do not have values in any other columns. Validation fails at the step that checks for the presence of at least one HGVS column.

jstone-dev commented 4 months ago

If we want to continue to allow comments in imported CSV files, we should clearly document the file format. This would be helpful anyway, but it's really necessary since comments aren't part of any CSV standard, though some parsers support them.

bencap commented 4 months ago

Seems to me like we don't need comment support in scores and counts files, but maybe @afrubin has a better sense of how useful it is. Removing the # reserved character from the csv parser seems like the simplest solution here to me, unless there is some utility I am missing.

afrubin commented 4 months ago

We used to use the comments in the exported files as a header to include the license information, access date, and other details about the score set record. I think this was useful and was much more important back when we had a diverse mix of data licenses rather than most things being CC0. I'm fine with removing it. We can always re-add it later if compelling use cases emerge.

bencap commented 4 months ago

Released in https://github.com/VariantEffect/mavedb-api/releases/tag/v2024.1.1