dieterich-lab / scimodom

Sci- ModoM: A quantitative database of transcriptome-wide high-throughput RNA modification sites
https://dieterich-lab.github.io/scimodom/
GNU Affero General Public License v3.0
0 stars 0 forks source link

Compare View: implement full bedRMod import #92

Open eboileau opened 4 months ago

eboileau commented 4 months ago

Aims/objectives.

In Compare, file import is handled by BEDImporter. BED6+ files, incl. bedRMod are cut down to BED6, unless euf is True, in which case the file is read as EU formatted (bedRMod), however the header is always ignored, and there is no "validation" of the actual records e.g. chrom, strand fields, etc. bedRMod should be read as bedRMod (similar to data import), unless BED-formatting is preferred.

A clear and concise description of todo items.

Unless BED-format is forced:

For BED, this does not effectively change: we cannot validate organism and/or assembly, we do not validate the records.

Additional information

BEDImporter does not validate records, as a result bad records are read (not skipped), unless this is due to e.g. wrong number of columns. In the first case, a comparison may return nothing, e.g. wrong chrom formatting for all records. No error is raised. In the latter case, bad records are skipped, in the worst case (e.g. file is bed, but bedrmod is selected by hand) this results in an empty set. Here a NoRecordsFoundError is raised.

eboileau commented 3 months ago

Import has now full functionality, but we still need to process/validate the header for bedRMod.

eboileau commented 1 month ago

Check latest changes, in particular e13b191d7e3118706f2cea0b38814ffb88105dcd this is relevant for a better integration of upload in general (bedRMod, BED6+).

Now, even for BED6+, strict requirements are implemented using model DTOs e.g. score should be integer. Would this make sense to relax these requirements - ONLY for the temporary upload in Compare - , and e.g. convert/round score to integer, etc.