dieterich-lab / scimodom

GNU Affero General Public License v3.0
0 stars 0 forks source link

Dataset import: handle failed import due to skipped rows or unammped features #86

Closed eboileau closed 1 month ago

eboileau commented 2 months ago

Aims/objectives.

During import, invalid rows are skipped (e.g. if chroms are not formatted short/Ensembl-style, etc.). If the data is lifted over, there could also be unmapped records. In a worst case scenario, an empty dataset is imported WARNING scimodom.services.annotation.annotate_data.193 | No records found for Kr6uj7QzWfLJ...

A clear and concise description of todo items.

We need a "no commit" fallback (e.g. if more than X% of original records are skipped, if file is empty, etc.) and adequate warning/error handling for all these cases. Ideally, we also want some logging back to the user, see #84 .

eboileau commented 2 months ago

I think we need to better handle the case where the input modification does not match that of the file (field 4 name)... e.g. select m6A (online upload form), but the file contains Y

[2024-04-30 12:37:16:662] WARNING scimodom.services.importer.base._read_line.231 | Skipping: Failed to parse /home/eboileau/Downloads/SCIMODOM_TEMPDIR/file1.bedrmod at row 52: Unrecognized name: Y.
[2024-04-30 12:37:16:663] DEBUG scimodom.services.dataset.create_dataset.366 | Added dataset dvUKvz86UvCB to project kWHRCmcg with title = test again, and the following associations: m6A:72. Annotating data now...
[2024-04-30 12:37:16:669] WARNING scimodom.services.annotation.annotate_data.193 | No records found for dvUKvz86UvCB...
eboileau commented 1 month ago

There are at least 2 places where we would need suitable error handling: (1) Login > Dataset upload (EUF), (2) Compare (upload).

eboileau commented 1 month ago

There are also other types or errors which are not handled at the moment, e.g.

[2024-05-16 12:55:04:607] ERROR scimodom.api.management.add_dataset.141 | Failed to create dataset: Unknown or outdated version bedRModv2.27326828.
# empty file
[2024-05-16 14:03:40:578] ERROR scimodom.api.management.add_dataset.141 | Failed to create dataset: /home/eboileau/Downloads/SCIMODOM_TEMPDIR/TEST/file_todel.bed