FRED-2 / Fred2

Python-based framework for computational immunomics
http://fred-2.github.io/
40 stars 29 forks source link

Connect to a local MySQL Server #206

Closed adefelicibus closed 7 years ago

adefelicibus commented 7 years ago

Hi, I'm trying to use an local Ensembl database but I couldn't connect to the core database. Is there any documentationn about it?

It is impossible to use remote BioMart as identification system im my case, because I have more than 400 samples to analyze.

Thanks.

b-schubert commented 7 years ago

Hi @adefelicibus,

The EnsemblDB connector is file based, so you need to download a fasta file of the part of Ensembl you need and then use the EnsemblDB-Adapter as described in the tutorial:

https://github.com/FRED-2/Fred2/blob/master/Fred2/tutorials/DBAdapterUsage.ipynb

ed = EnsemblDB()
ed.read_seqs("data/Homo_sapiens.GRCh38.cds.test_stub.fa")
ed.read_seqs("data/Homo_sapiens.GRCh38.pep.test_stub.fa")

Every other DB-dependent function should work with the EnsemblDB-Object as well as it implements our ADBAdapter-interface.

Let me know if you run into further problems, Benni

adefelicibus commented 7 years ago

Hi @b-schubert, Thank you for your quickly response. I used the EnsenblDB as you said and I could read the Fasta.

So, I'm trying to generatate transcripts or peptides from variants. I read the tutorial about it but I'm not getting it. I'm trying this: var = read_annovar_exonic(args.input_exonic_vars) ens = EnsemblAdapter.EnsemblDB() ens.read_seqs(args.input_fasta) trans = list(generate_transcripts_from_variants(var, ens, EIdentifierTypes.ENSEMBL))

But I got an empty list.

Could you, please, help me to how generate transcripts from variants using Ensembl annotation.

Thank you.

b-schubert commented 7 years ago

The variants in args.input_exonic_vars have to be called with the same reference system that you feed into the EnsemblDB (i.e if you used Ensembl GRCh38 for variant calling than you should initialize the EnsemblDB also with a FASTA file from Ensembl GRCh38); otherwise the identifiers don't match and you won't get any annotated transcripts.