Closed imran-aifi closed 1 year ago
Extracting the genome from the MOCHA object is non-trivial. Could we have it set so that if it's null, it checks the directory for a saved genome file, and returns an useful error if it's not found?
Changes:
$Genome
$TxDb & $Org
All:
We now save the package names of the genome, TxDb, and OrgDb. We use MOCHA:::getAnnotationDbFromInstalledPkgname and BSgenome::getBSgenome to check installation and load these package from the name. Saved metadata now looks like this:
> metadata(SampleTileMatrix)
$CellCounts
cellTypeLabelList
C2 C5
PBMCSmall 152 201
$FragmentCounts
C2 C5
PBMCSmall 117146 171033
$Genome
[1] "hg19"
$TxDb
$TxDb$pkgname
[1] "TxDb.Hsapiens.UCSC.hg38.refGene"
$TxDb$metadata
name value
1 Db type TxDb
2 Supporting package GenomicFeatures
3 Data source UCSC
4 Genome hg38
5 Organism Homo sapiens
6 Taxonomy ID 9606
7 UCSC Table refGene
8 UCSC Track NCBI RefSeq
9 Resource URL http://genome.ucsc.edu/
10 Type of Gene ID Entrez Gene ID
11 Full dataset yes
12 miRBase build ID <NA>
13 Nb of transcripts 88816
14 Db created by GenomicFeatures package from Bioconductor
15 Creation time 2021-04-28 16:30:46 +0000 (Wed, 28 Apr 2021)
16 GenomicFeatures version at creation time 1.41.3
17 RSQLite version at creation time 2.2.6
18 DBSCHEMAVERSION 1.2
$OrgDb
$OrgDb$pkgname
[1] "org.Hs.eg.db"
$OrgDb$metadata
name value
1 DBSCHEMAVERSION 2.1
2 Db type OrgDb
3 Supporting package AnnotationDbi
4 DBSCHEMA HUMAN_DB
5 ORGANISM Homo sapiens
6 SPECIES Human
7 EGSOURCEDATE 2021-Sep13
8 EGSOURCENAME Entrez Gene
9 EGSOURCEURL ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
10 CENTRALID EG
11 TAXID 9606
12 GOSOURCENAME Gene Ontology
13 GOSOURCEURL http://current.geneontology.org/ontology/go-basic.obo
14 GOSOURCEDATE 2021-09-01
15 GOEGSOURCEDATE 2021-Sep13
16 GOEGSOURCENAME Entrez Gene
17 GOEGSOURCEURL ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
18 KEGGSOURCENAME KEGG GENOME
19 KEGGSOURCEURL ftp://ftp.genome.jp/pub/kegg/genomes
20 KEGGSOURCEDATE 2011-Mar15
21 GPSOURCENAME UCSC Genome Bioinformatics (Homo sapiens)
22 GPSOURCEURL
23 GPSOURCEDATE 2021-Jul20
24 ENSOURCEDATE 2021-Apr13
25 ENSOURCENAME Ensembl
26 ENSOURCEURL ftp://ftp.ensembl.org/pub/current_fasta
27 UPSOURCENAME Uniprot
28 UPSOURCEURL http://www.UniProt.org/
29 UPSOURCEDATE Wed Sep 15 18:21:59 2021
$Directory
[1] "/Users/imran.mcgrath/Documents/projects/PBMCSmall/MOCHA"
Merging this since the changes are tested and do not affect parallelization strategies
A MOCHA object created on one filesystem will contain a genome in the metadata - this is not the full BSGenome database, but the in-memory object pointing to it. An issue arises when the MOCHA object is shared to another computer/filesystem, and the BSGenome database the genome object points to no longer exists or does not exist in the location it points to.
We can get around this by catching the error and exposing genome as a parameter so the user can provide the correct genome and is required to have it installed.