Closed eboileau closed 4 months ago
Current chrom.sizes does not work with previous assemblies, i.e chrom.sizes downloaded in AnnotationService
(at project creation) refers to the most recent version. Fixing a different assembly will results in e.g. WARNING: 1:195220050-195222766 exceeds the length of chromosome (1)
in pybedtools.
cf.
+----+--------+---------+--------------+
| id | name | taxa_id | version |
+----+--------+---------+--------------+
| 1 | GRCh38 | 9606 | K9FeTPiZ4abQ |
| 2 | GRCm38 | 10090 | K9FeTPiZ4abQ | -- we can't liftover now...
+----+--------+---------+--------------+
+----+---------+---------+--------------+
| id | release | taxa_id | version |
+----+---------+---------+--------------+
| 1 | 110 | 9606 | cp6qKL4t4Wws |
| 2 | 102 | 10090 | cp6qKL4t4Wws | -- downloads 102, but chrom.sizes is from most recent GRCm39
+----+---------+---------+--------------+
I temporarily edited chrom.sizes for mouse to match GRCm38.
Mouse to 102 for annotation to keep GRCm38, but need to edit chrom.sizes, see https://github.com/dieterich-lab/scimodom/issues/31#issuecomment-1862523009
Todo: import GRCm39 as current version.
Aims/objectives.
Assemblies for different organisms are grouped into an
assembly_version
, which defines the assemblies used in Sci-ModoM. For bedRMod dataset that are of a different assembly version, we need some liftover service. See documentation at Assembly for a short description.A clear and concise description of todo items.
AssemblyService
to (i) deal with assembly version in general, (ii) replaceAnnotationService
for downloading chrom.sizes , (iii) perform liftover of data records.AnnotationService
- called at project creation for new organisms - and cleaning is done inEUFImporter
.chrom
field is modified ad hoc ("chromosome|chrom|chr" removed, currently does not handle "M" -> "MT"). We need a more robust method, or fix a format for upload, and a fallback method.AssemblyService
outside Flask context.Liftover
pip install CrossMap==0.6.6
. Add to requirements.