--transcriptdb ensembl produces no output

TheJacksonLaboratory / LIRICAL

LIkelihood Ratio Interpretation of Clinical AbnormaLities

https://thejacksonlaboratory.github.io/LIRICAL/stable

Other

22 stars 11 forks source link

--transcriptdb ensembl produces no output #462

Closed justaddcoffee closed 4 years ago

justaddcoffee commented 4 years ago

Per discussion with Peter, this is likely an issue with gene2genotypeMap not being populated correctly, possibly because NCBI gene prefix is hardcoded.

Example command:

java -jar target/LIRICAL.jar phenopacket -p simple.txt --transcriptdb ensembl -e data/1909_hg19

using the attached simple.txt phenopacket produces mostly blank HTML: Screen Shot 2019-11-07 at 4 48 51 PM

--transcriptdb refseq and --transcriptdb ucsc seem to work correctly.

pnrobinson commented 4 years ago

I can confirm this bug. Evertying works for --ucsc and --refseq but not for --ensembl. I suspect it is because we are not correctly getting the EnrezGene ids from Jannovar with Ensembl.

julesjacobsen commented 4 years ago

JannovarData jannovarData = JannovarDataSourceLoader.loadJannovarData(Paths.get("/users/jules/exomiser-data/1902_hg19/1902_hg19_transcripts_ensembl.ser"));
VariantAnnotator variantAnnotator = new JannovarVariantAnnotator(GenomeAssembly.HG19, jannovarData, ChromosomalRegionIndex.empty());
System.out.println(variantAnnotator.annotate("1", 43296195, "C", "T"));
GeneFactory geneFactory = new GeneFactory(jannovarData);
Set<GeneIdentifier> geneIdentifiers = geneFactory.getGeneIdentifiers().stream().filter(geneIdentifier -> geneIdentifier.getGeneSymbol().equals("ERMAP")).collect(Collectors.toSet());
System.out.println(geneIdentifiers);

pnrobinson commented 4 years ago

I think we should probably just disallow Ensembl for now since this would optimally require a good amount of refactoring and if LIRICAL is later integrated into Exomiser we will need to do it again. I will remove the Ensembl option, and if users turn out to want it we can reopen.