Closed eboileau closed 7 months ago
e.g.
10 7596572 7596573 m6A 1000 + 10 ensembl_havana gene 7559270 7666998 . - . gene_id "ENSG00000123243"; gene_version "15"; gene_name "ITIH5"; gene_source "ensembl_havana"; gene_biotype "protein_coding";
10 7596572 7596573 m6A 1000 + 10 ensembl_havana transcript 7559270 7666966 . - . gene_id "ENSG00000123243"; gene_version "15"; transcript_id "ENST00000397146"; transcript_version "7"; gene_name "ITIH5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ITIH5-202"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS31139"; tag "basic"; tag "Ensembl_canonical"; tag "MANE_Select"; transcript_support_level "1 (assigned to previous version 6)";
10 7596572 7596573 m6A 1000 + 10 ensembl_havana transcript 7562424 7619660 . - . gene_id "ENSG00000123243"; gene_version "15"; transcript_id "ENST00000613909"; transcript_version "4"; gene_name "ITIH5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ITIH5-209"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS31140"; tag "basic"; transcript_support_level "1";
10 7596572 7596573 m6A 1000 + 10 havana transcript 7571405 7640779 . - . gene_id "ENSG00000123243"; gene_version "15"; transcript_id "ENST00000434980"; transcript_version "5"; gene_name "ITIH5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ITIH5-203"; transcript_source "havana"; transcript_biotype"protein_coding_CDS_not_defined"; transcript_support_level "2";
10 7596572 7596573 m6A 1000 + 10 ensembl transcript 7571405 7666998 . - . gene_id "ENSG00000123243"; gene_version "15"; transcript_id "ENST00000397145"; transcript_version "6"; gene_name "ITIH5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ITIH5-201"; transcript_source "ensembl"; transcript_biotype "protein_coding"; tag "basic"; transcript_support_level "2";
10 7596572 7596573 m6A 1000 + 10 havana transcript 7572772 7622477 . - . gene_id "ENSG00000123243"; gene_version "15"; transcript_id "ENST00000476417"; transcript_version "5"; gene_name "ITIH5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ITIH5-207"; transcript_source "havana"; transcript_biotype"retained_intron"; transcript_support_level "2";
10 7596572 7596573 m6A 1000 + 10 havana transcript 7576892 7617266 . - . gene_id "ENSG00000123243"; gene_version "15"; transcript_id "ENST00000461751"; transcript_version "1"; gene_name "ITIH5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ITIH5-204"; transcript_source "havana"; transcript_biotype"nonsense_mediated_decay"; tag "cds_start_NF"; tag "mRNA_start_NF"; transcript_support_level "5";
If a record does not intersect with annotation with the "correct" strand (except for intergenic), there is not much we can do....
A clear and concise description of what the bug is.
Records located in intronic regions of a gene on the opposite strand are not annotated. I suspect that such cases are just misannotated, i.e. the bedRMod record is wrong e.g. 10:7596572-7596573 or 10:10426859-10426860.
These are most likely part of some contig, but we do not include contigs in Sci-ModoM.
With the current search query, these records are lost when joining (INNER) GenomicAnnotation, because
data.id
are non-existant in GenomicAnnotation. We'd need a LEFT OUTER JOIN to recover them, but first we need to sort performance bottlenecks.Note that such records cannot be entered in GenomicAnnotation, because
feature: Mapped[str] = mapped_column(String(32), nullable=False)
.Output or error messages.
No response
Additional context
No response
What browser were you using?
Firefox
What version of Sci-ModoM were you using?
dev