dieterich-lab / scimodom

GNU Affero General Public License v3.0
0 stars 0 forks source link

DB assembly version handling #31

Closed eboileau closed 4 months ago

eboileau commented 9 months ago

Aims/objectives.

Assemblies for different organisms are grouped into an assembly_version, which defines the assemblies used in Sci-ModoM. For bedRMod dataset that are of a different assembly version, we need some liftover service. See documentation at Assembly for a short description.

A clear and concise description of todo items.

Liftover

eboileau commented 7 months ago

Current chrom.sizes does not work with previous assemblies, i.e chrom.sizes downloaded in AnnotationService (at project creation) refers to the most recent version. Fixing a different assembly will results in e.g. WARNING: 1:195220050-195222766 exceeds the length of chromosome (1) in pybedtools.

cf.

+----+--------+---------+--------------+
| id | name   | taxa_id | version      |
+----+--------+---------+--------------+
|  1 | GRCh38 |    9606 | K9FeTPiZ4abQ |
|  2 | GRCm38 |   10090 | K9FeTPiZ4abQ | -- we can't liftover now...
+----+--------+---------+--------------+

+----+---------+---------+--------------+
| id | release | taxa_id | version      |
+----+---------+---------+--------------+
|  1 |     110 |    9606 | cp6qKL4t4Wws |
|  2 |     102 |   10090 | cp6qKL4t4Wws | -- downloads 102, but chrom.sizes is from most recent GRCm39
+----+---------+---------+--------------+

I temporarily edited chrom.sizes for mouse to match GRCm38.

eboileau commented 5 months ago

Mouse to 102 for annotation to keep GRCm38, but need to edit chrom.sizes, see https://github.com/dieterich-lab/scimodom/issues/31#issuecomment-1862523009

eboileau commented 4 months ago

Todo: import GRCm39 as current version.