BIMSBbioinfo / ciRcus

An R package for annotation of circular RNAs
10 stars 5 forks source link

New genome assembly #54

Open marvel479 opened 4 years ago

marvel479 commented 4 years ago

Hi! I am new to coding and have been trying to annotate my identified splice sites from find_circ.py. My reads are from Danio Rerio11, which is not currently supported. Is there a way around/ alternate code for loading ENSEMBL annotations such that I can still use ciRcus.

mschilli87 commented 4 years ago

@marvel479: Have a look at https://github.com/BIMSBbioinfo/ciRcus/pull/51/commits for an example how to add a species. As long as your assembly is on Ensembl, it's straight forward. Otherwise, you could create the database yourself instead of relying on ciRcus. If this is too complicated for you, but you are willing to test for me, I could support you by attempting a PR.

marvel479 commented 4 years ago

@mschilli87, I am still not sure how to do this bit, It will be great if you can edit in another genome. I am most willing to test this out for you and assist in any other way possible, my skills unfortunately at the time are limited. Your help is really appreciated.

mschilli87 commented 4 years ago

@marvel479:

Could you please test the following branch and report back so we can add this to the development version if it works?

BiocManager::install("BIMSBbioinfo/circus@dr11")
marvel479 commented 4 years ago

Absolutely. I will test it out and get back to you.

Regards, Aayushi

marvel479 commented 4 years ago

Hi Marcel, So I know this error has been seen before, but not clearly resolved. When I just trying to load the human database for tests, I get the following error:

`> gtf2sqlite( assembly = "hg19", db.file = system.file("extdata/db/human_hg19_ens75_txdb.sqlite", package="ciRcus"))

snapshotDate(): 2019-10-29 downloading 1 resources retrieving 1 resource |=================================================================================================================| 100%

loading from cache TxDb object:

Db type: TxDb

Supporting package: GenomicFeatures

Genome: hg19

transcript_nrow: 196354

exon_nrow: 674156

cds_nrow: 269141

Db created by: GenomicFeatures package from Bioconductor

Creation time: 2020-03-30 01:24:15 -0700 (Mon, 30 Mar 2020)

GenomicFeatures version at creation time: 1.38.2

RSQLite version at creation time: 2.2.0

DBSCHEMAVERSION: 1.2

Warning message: In .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.

annot.list <- loadAnnotation(system.file("extdata/db/human_hg19_ens75_txdb.sqlite",

  • package="ciRcus"))

loading TxDb annotation from SQLite database file... Error in dbFileConnect(file) : DB file '' not found`

Any idea how I can get this taken care of? If system.files is not required, how can I have the databases included in the package get loaded?

mschilli87 commented 4 years ago

@marvel479: I think there is a misconception. AFAIK, system.file(..., package = "ciRcus") just returns a standard path and you can completely ignore it. I'm not even sure if ciRcus actually does ship any annotation. gtf2sqlite uses the AnnotationHub package to query it from ENSEMBL's servers and stores a local copy in a SQLite3 database at the path you tell it. So just make sure to pass an existing, writable path (forget about system.file) and you should be able to load it with loadAnnotation.

@retaj: Maybe we should update the README and/or fix system.file to actually return something useful rather than '' for ciRcus? :wink:

mschilli87 commented 4 years ago

@marvel479: Did my canges in #55 work for you? With some feedback from your side we'd be able to inculde this in the next ciRcus version so other fly researchers benefit from my work as well. But I'd be reluctant to share untested code. So please get back to us.

marvel479 commented 4 years ago

Hi Marcel, I am still trying it, I had some read manipulation to be done before I can get to it. I really appreciate your commit, and will get back to you soon.

On Mon, Apr 13, 2020 at 10:32 PM Marcel Schilling notifications@github.com wrote:

@marvel479 https://github.com/marvel479: Did my canges in #55 https://github.com/BIMSBbioinfo/ciRcus/pull/55 work for you? With some feedback from your side we'd be able to inculde this in the next ciRcus version so other fly researchers benefit from my work as well. But I'd be reluctant to share untested code. So please get back to us.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BIMSBbioinfo/ciRcus/issues/54#issuecomment-613234547, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO5TFUADWWDGYWC44JFJQB3RMPYOFANCNFSM4LUS5BAA .

-- Regards, Aayushi

marvel479 commented 4 years ago

@marvel479: Did my canges in #55 work for you? With some feedback from your side we'd be able to inculde this in the next ciRcus version so other fly researchers benefit from my work as well. But I'd be reluctant to share untested code. So please get back to us.

Hi Marcel, I have verified that the new assembly does work. I would like to request you to please add it as dr11 and not GRCz11, as dr11 is more intuitive, but apart from that, it works perfectly. I also would like to mention that system.files does not work to make the Annot.list, and instead I used a local path.

Thanks for all your help through these issues.