BreakerLab / dimpl

DIMPL: Discovery of Intergenic Motifs PipeLine
MIT License
3 stars 3 forks source link

Genome_context.py issues #21

Closed slyyy312 closed 3 years ago

slyyy312 commented 3 years ago

I have done two different motif updates and found three separate runtime errors during the genome_context.py step. Attached is a screenshot of the error I received.

First and second error: the genome context fails at some point - see the screenshot. I tried to attach the .tar.gz files, but seems like it doesn't work. Email seth.lyon@yale.edu for them so you can reproduce the error if need be. I think this is likely to happen with many motifs.

Third error (no screenshot): any time you click where it says "HIT" or any gene in any genome context image, it always links to this: https://www.ncbi.nlm.nih.gov/protein/YP_034512.1 . This happened with both of the motifs i've attached here. This might be a systemic issue?

Screen Shot 2021-05-28 at 11 39 33 PM Screen Shot 2021-05-28 at 11 32 51 PM
ggaffield commented 3 years ago

The ENV12 modification made an assumption that an accession number would not exist in both RefSeq and ENV12. That has turned out to NOT be the case. And when it DIMPL does find an ENV12 accession number in RefSeq, it doesn't find an associated RefSeq assembly (because RefSeq typically excludes metagenomic assemblies). This error essentially says "assembly not found on RefSeq".

So a slight adjustment needs to be made to the data fetch logic, and you should be back in business.