Closed justaddcoffee closed 4 years ago
I can confirm this bug. Evertying works for --ucsc and --refseq but not for --ensembl. I suspect it is because we are not correctly getting the EnrezGene ids from Jannovar with Ensembl.
JannovarData jannovarData = JannovarDataSourceLoader.loadJannovarData(Paths.get("/users/jules/exomiser-data/1902_hg19/1902_hg19_transcripts_ensembl.ser"));
VariantAnnotator variantAnnotator = new JannovarVariantAnnotator(GenomeAssembly.HG19, jannovarData, ChromosomalRegionIndex.empty());
System.out.println(variantAnnotator.annotate("1", 43296195, "C", "T"));
GeneFactory geneFactory = new GeneFactory(jannovarData);
Set<GeneIdentifier> geneIdentifiers = geneFactory.getGeneIdentifiers().stream().filter(geneIdentifier -> geneIdentifier.getGeneSymbol().equals("ERMAP")).collect(Collectors.toSet());
System.out.println(geneIdentifiers);
I think we should probably just disallow Ensembl for now since this would optimally require a good amount of refactoring and if LIRICAL is later integrated into Exomiser we will need to do it again. I will remove the Ensembl option, and if users turn out to want it we can reopen.
Per discussion with Peter, this is likely an issue with
gene2genotypeMap
not being populated correctly, possibly because NCBI gene prefix is hardcoded.Example command:
java -jar target/LIRICAL.jar phenopacket -p simple.txt --transcriptdb ensembl -e data/1909_hg19
using the attached simple.txt phenopacket produces mostly blank HTML:![Screen Shot 2019-11-07 at 4 48 51 PM](https://user-images.githubusercontent.com/150311/68439998-85c4dd80-017e-11ea-8a31-c1dbbd0f8400.png)
--transcriptdb refseq
and--transcriptdb ucsc
seem to work correctly.