dieterich-lab / scimodom

GNU Affero General Public License v3.0
0 stars 0 forks source link

Implement dataset upload #45

Closed eboileau closed 4 months ago

eboileau commented 8 months ago

Aims/objectives.

Dataset can now be imported via add-dataset. We need an upload form that, on submission, instantiates a DataService.

This form is ultimately only accessible after login, but I think it is a good idea to implement it as soon as possible, and handle route guards in a smart way before #43 is completed. This will help us to revisit services EUFImporter, ProjectService, and DataService (and maintenance scripts).

A clear and concise description of todo items.


eboileau commented 7 months ago

We need to deal with "all uploads" consistently, and eventually data cleaning, cf. Compare View - data upload. For the upload, we opted for a quick solution: ad hoc importer for BED6/11/12, read in file for each query, return as records. In the longer-term, we need something more robust (harmonize upload classes).

For the Compare View upload, MIME types are not sufficient, we need to add .bed,.bedrmod, but nevertheless it is recognized in sever-side as application/vnd.realvnc.bed. Does this matter?

eboileau commented 7 months ago

After receiving data for #46, I reckon that EUFImporter has to be modified to allow minor variations against the specs:

eboileau commented 7 months ago

2023-12-15 now EUF v1.7, see #49 .

As for handling "optional header fields" and annotation_source and annotation_version, I don't know. In principle we don't care about what value they have, if any, so maybe we don't need to bother whether we cast them to None or just leave them as they were given, even if this is None, Na, etc. When writing to EUF, we should overwrite annotation_source and annotation_version with that from the DB.

chrom field standardization is there, subject to change.

TODO: revisit services/importer.py (EUFImporter, BEDImporter)