RamsinghLab / TxDbLite

This is TxDbLite with respect arkas arcolombo branch (staging area)
0 stars 0 forks source link

Final TxDbLite Checking; rebuilding HSapiens, Mouse Packages as final #28

Closed arcolombo closed 8 years ago

arcolombo commented 8 years ago

So I've successfully rebuilt H.Sapiens ENSEMBL 81 packages; the only fault is that it does not load the DB on package loading (To be fixed soon). need to check Mouse, and human repeat package builds.

1] "EnsDbLite.Hsapiens.81.sqlite"

EnsDbLite(ens) EnsDbLite : |package_name: EnsDbLite.Hsapiens.81 |db_type: EnsDbLite |type_of_gene_id: Ensembl Gene ID |created_by: TxDbLite 1.9.113 |creation_time: Fri Aug 19 14:18:18 2016 |organism: Homo sapiens |genome_build: GRCh38 |source_file: Homo_sapiens.GRCh38.81.cdna_mergedWith_ncrna.fa.gz | 37197 transcripts from 23911 bundles (genes).

arcolombo commented 8 years ago

successful rebuild of a FASTA file with path name , testing the basename addition to TxDbLite calls Nb. look at source file..

EnsDbLite(ens) EnsDbLite : |package_name: EnsDbLite.Hsapiens.81 |db_type: EnsDbLite |type_of_gene_id: Ensembl Gene ID |created_by: TxDbLite 1.9.113 |creation_time: Fri Aug 19 14:26:41 2016 |organism: Homo sapiens |genome_build: GRCh38 |source_file: ~/Documents/github_repos/arkasData/inst/extdata/fasta/Homo_sapiens.GRCh38.81.cdna.all.fa.gz | 175372 transcripts from 38530 bundles (genes).

arcolombo commented 8 years ago

successful rebuild of H.Sapiens RepBase 2005

rep [1] "RepDbLite.Hsapiens.2005.sqlite" RepDbLite(rep) RepDbLite : |package_name: RepDbLite.Hsapiens.2005 |db_type: RepDbLite |type_of_gene_id: RepBase identifiers |created_by: TxDbLite 1.9.113 |creation_time: Fri Aug 19 14:30:55 2016 |organism: Homo sapiens |genome_build: RepBase20_05 |source_file: Homo_sapiens.RepBase.20_05.merged.fa.gz | 1116 repeat exemplars from 68 repeat families (no known genes). makeRepDbLitePkg(rep) Creating package in ./RepDbLite.Hsapiens.2005 [1] "RepDbLite.Hsapiens.2005"

arcolombo commented 8 years ago

library(RepDbLite.Hsapiens.2005) RepDbLite.Hsapiens.2005 RepDbLite : |package_name: RepDbLite.Hsapiens.2005 |db_type: RepDbLite |type_of_gene_id: RepBase identifiers |created_by: TxDbLite 1.9.113 |creation_time: Fri Aug 19 14:30:55 2016 |organism: Homo sapiens |genome_build: RepBase20_05 |source_file: Homo_sapiens.RepBase.20_05.merged.fa.gz | 1116 repeat exemplars from 68 repeat families (no known genes).

package loading is working fine (I am an idiot)

arcolombo commented 8 years ago

need to check Mouse, and perhaps Drosophila Melanogaster.

arcolombo commented 8 years ago

successfully rebuild a Mouse TxDbLite ENSEMBL Package.

mus<-ensDbLiteFromFasta("Mus_musculus.GRCm38.cdna.all.fa.gz") Loading required package: org.Mm.eg.db

Extracting transcript lengths...done. Extracting transcript descriptions...done. Extracting genomic coordinates...done. Extracting gene and biotype associations...done. Tabulating GC content...done. Tabulating transcript biotypes...done. Tabulating genes......done. Creating the database...done. Writing the gene table...done. Tabulating gene biotypes...done. Writing the gene_biotype table...done. Writing the tx table...done. Tabulating transcript biotypes...done. Writing the tx_biotype table...done. Writing the biotype_class table...done.

mus [1] "EnsDbLite.Mmusculus.cdna.sqlite" EnsDbLite(mus) EnsDbLite : |package_name: EnsDbLite.Mmusculus.cdna |db_type: EnsDbLite |type_of_gene_id: Ensembl Gene ID |created_by: TxDbLite 1.9.113 |creation_time: Sun Aug 21 09:16:21 2016 |organism: Mus musculus |genome_build: GRCm38 |source_file: Mus_musculus.GRCm38.cdna.all.fa.gz | 98492 transcripts from 32737 bundles (genes). makeDbLitePkg(mus) Error: could not find function "makeDbLitePkg" makeEnsDbLitePkg(mus) Creating package in ./EnsDbLite.Mmusculus.cdna [1] "EnsDbLite.Mmusculus.cdna" library(EnsDbLite.Mmusculus.cdna) EnsDbLite.Mmusculus.cdna EnsDbLite : |package_name: EnsDbLite.Mmusculus.cdna |db_type: EnsDbLite |type_of_gene_id: Ensembl Gene ID |created_by: TxDbLite 1.9.113 |creation_time: Sun Aug 21 09:16:21 2016 |organism: Mus musculus |genome_build: GRCm38 |source_file: Mus_musculus.GRCm38.cdna.all.fa.gz | 98492 transcripts from 32737 bundles (genes).

arcolombo commented 8 years ago

Mouse RepBase Library creation and loading works

rr<-repDbLiteFromMouseFasta("Mus_musculus.RepBase.mousub_merged_rodrep.fa") Extracting repeat lengths...done. Extracting repeat descriptions...1020 uncataloged repeat biotypes, fix case... 1020 uncataloged repeat biotypes, fix Tiggers... 1019 uncataloged mouse repeat biotypes, fix Alus... Alus were not found in uncataloged mouse repeats, skipping ... 1019 uncataloged mouse repeat biotypes, fix LINE1... 937 uncataloged mouse repeat biotypes, fix MERs... 931 uncataloged mouse repeat biotypes, fix LTRs... 788 uncataloged mouse repeat biotypes, fix SVAs... SVAs were not found in the uncataloged mouse ... 788 uncataloged mouse repeat biotypes, fix SINEs... SINEs were not found in the repeat fasta for mouse ... 788 uncataloged mouse repeat biotypes, fix Mariners... 788 uncataloged mouse repeat biotypes... hinting... 0 uncataloged repeat biotypes after hinting. done. Creating the database...done. Warning message: In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec, : reading FASTA file Mus_musculus.RepBase.mousub_merged_rodrep.fa: ignored 327 invalid one-letter sequence codes RepDbLite(rr) RepDbLite : |package_name: RepDbLite.Mmusculus.RepBase |db_type: RepDbLite |type_of_gene_id: RepBase identifiers |created_by: TxDbLite 1.9.113 |creation_time: Sun Aug 21 09:34:28 2016 |organism: Mus musculus |genome_build: RepBasemousub_merged_rodrep |source_file: Mus_musculus.RepBase.mousub_merged_rodrep.fa | 1563 repeat exemplars from 72 repeat families (no known genes). makeRepDbLitePkg(rr) Creating package in ./RepDbLite.Mmusculus.RepBase [1] "RepDbLite.Mmusculus.RepBase" library(RepDbLite.Mmusculus.RepBase) RepDbLite.Mmusculus.RepBase RepDbLite : |package_name: RepDbLite.Mmusculus.RepBase |db_type: RepDbLite |type_of_gene_id: RepBase identifiers |created_by: TxDbLite 1.9.113 |creation_time: Sun Aug 21 09:34:28 2016 |organism: Mus musculus |genome_build: RepBasemousub_merged_rodrep |source_file: Mus_musculus.RepBase.mousub_merged_rodrep.fa | 1563 repeat exemplars from 72 repeat families (no known genes).

arcolombo commented 8 years ago

closing I'm okay with my master branch. i am going to merge my master (PR ) into head. the mouse / human ENSEMBL and RepBase stuff works.

arcolombo commented 8 years ago

I could test Drosophila Melanogaster, but the Bioc package is only going to support Mus and Homo initially