bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
72 stars 13 forks source link

Updating Gene Annotations #188

Closed alexjfortuna closed 5 years ago

alexjfortuna commented 5 years ago

MAVIS version: <2.2.1>

Python version: <3.6.4>

OS: <ubuntu 18.04>

Expected Behaviour

I would like to receive annotated fusions; I view the fusion I am expecting; however instead of gene1 = C11orf95; the gene1 says None. Can I update the gene annotations?

Actual Behaviour

Genes annotated in a similar manner to star-fusion, which is the input file in this case.

Steps to Reproduce the Behaviour

creisle commented 5 years ago

Hi @alexjfortuna ,

What annotations set are you using for star-fusion? Can you give me more details on the fusion? what are the breakpoints/strand for it?

Also which annotation files did you run mavis with?

alexjfortuna commented 5 years ago

Hello @creisle I am using the annotations for hg19 which came from MAVIS. These are ensembl69 IDs. When I click on the file I do not see C11orf95 as an annotation.

For Star-Fusion I am using the annotations that come with the package. They are Hugo IDs.

I have attached the MAVIS summary file here which includes the breakpoints for the RELA Fusion, which is the one I am concerned about. mavis_summary_XXX.txt

creisle commented 5 years ago

@alexjfortuna by default we included only the protein-coding genes/transcripts in the default annotations file. Ensembl doesn't host the archive for 69 anymore, but the closest related archive 67 has this with no protein-coding transcripts. http://may2012.archive.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000188070;r=11:63527364-63536113

It should be pretty straightforward to build a reference file including non-coding genes. We still have the copy of the 69 db locally. Would that be helpful?

alexjfortuna commented 5 years ago

Hi @creisle if you could build a reference for us which includes non-coding genes that would be most helpful!

alexjfortuna commented 5 years ago

Hi @creisle is this ready to test?

creisle commented 5 years ago

@alexjfortuna we have generated the new annotations file. We are just running it with some test libraries on our end to profile if it will change the memory/time requirements and if so by how much. We should have something ready for you to test in the next couple of days. Thanks for being so patient

creisle commented 5 years ago

Hi @alexjfortuna we have generated the annotations file see http://mavis.bcgsc.ca/docs/latest/mavis_input.html#annotations for the link to the new file.

This file is quite a big larger so it will require more memory by default for the annotation step (see the warning next to the link)

We also added a fix to ensure that the loading does not break on the JSON containing number vs string for the strand.

Please test with the latest release 2.2.5

alexjfortuna commented 5 years ago

I have installed the latest release 2.2.5 and am running with the larger annotation file. Will report back. Thank you so much!

alexjfortuna commented 5 years ago

Working well! Thank you for all the help