Open dnil opened 9 months ago
Seems no longer blocked by gene names / gene id for the clinical track at least! Ignore research for now, or run it like fusion with unlinked gene symbols (they are presumably from the same annotation source). One of the files does not have hgnc_id, but does have symbols that appear to match ensg id that are also available. tsv format still, but it is very close to a bed. Possibly they can convert, but lets prep for tsv parsing.
Hi, what's the plan for this? Could we have the tsv tables from fraser and outrider in scout?
Yes, with hgnc_id or ok gene symbols this should be possible, albeit a little reluctantly - a BED or other standard format would have been preferred.
If its still open when I’m done with Poorpipe I will grab it!
that's great! There are hgnc_ids. Thanks
Examples or outfiles in this folder: /home/proj/stage/analysis/cases/legalpeacock/analyse_transcripts/drop
I mean on this @dnil, I ca start if you are not working on it already
I do have a partial PR. So far it's been testing different ideas for frameworking this, but I think it starts to converge. If you want part of it maybe I can try and break out sub tasks.
What I have outlined is the addition of a new collection, with omics_variants
, which can be pertain to wts
analysis, be of category outlier
and sub_category fraser
or outrider
. I think this is an ok structure so far, but would be happy for opinions on the naming/structure here. We could skip the omicsvariant level and name the new variant kind outliers
instead, if we feel outliers is what we will get for the foreseeable future with transcriptomics and proteomics. I still don't have a good feel for where we will end up with methylation: it could be on the aggregate outlier level, like these, with like "hypermethylation" for a gene or promoter region. This would work well with "outlier". But we could also end short of that and have more like base modification found at chr11, pos 1111111, for 33% of bases on the position. In which case the omics variants is better. I also don't know exactly where we will end up with the difficult gene regions, as called with e.g. paraphase. Clearly a lot more regions than the current SMN and FSHD only, so a general mechanism would be good. Maybe we can fit some of them as regular SNVs or SVs but given the experience from SMN, probably not. In which case omics_variants
could serve.
The omics_variants
can be clinical
or research
. Files can be passed in the case config, parsed from tsv, variants loaded and connected to genes. I'm working on the display page.
It still lacks quite a few things:
A couple of notes about the latest input (tsv) files:
hgnc_id
(and much like RNAfusion keep some gene symbols not easily found on other dbs). But all seem to have ENSG geneID. Remains to be seen if we have enough of those ids populated: we are good on them for genes and transcripts that we parse for other reasons, but not necessarily for everything in ENSEMBL.@Lucpen @jemten can you have a look at the comment above? I think it might be good to have a meeting with all of us to make sure we're on the same page and we know which changes have to be done both at the pipeline and scout level
Thanks for condensing the remaining issues and questions @dnil!
I agree with @fevac that it would be good to have an alignment meeting so that we know we're solving the right issues.
I agree, it would be easier to have a meeting and discuss all of these.
Hi folks! Feedback on if this is good enough for a minimum viable product, especially with ideas about what else we should really show from the start would be very welcome. See https://scout-stage.scilifelab.se/cust000/17161-RNA-5M_downsampled/omics_variants/outliers?variant_type=clinical for a demo!
Looks very nice Daniel! I think that it is good enough to start with and then we can see what kind of requests we get. Small comment would be to more clearly convey the type of expression outlier. Instead of having to read the snake case outlier name the expression direction could perhaps be added to the type?
Good point! Added a direction icon, or a delta-psi symbol for splicers, and threw in links to the case SNV and SV views as well:
tomte and other pipelines including DROP components will produce shortlists of aberrant expression (
OUTRIDER
) and splicing (FRASER
). These are currently given in tsv files with a bunch of columns, with a somewhat shakyhgnc_symbol
as the common descriptor.FRASER
files additionaly include genomic coordinates, whereasOUTRIDER
have an ENSEMBLgeneID
.We would like to display these results to the user, allow navigation to the region of interest in an RNA cram / splice junction track IGV.js view, and connect gene knowledge, gene panels, phenotype and other scout features (case, comments, events etc). If a wgs case exists for the same individual, linking them is perfect, and filtering the DNA changes for the RNA events would be nice.
It is not completely obvious if it would be best to require VCF input instead, or have an intermediary process for good HGNC ID and/or known transcript identifiers as anchor points to feed to Scout, but one could tentatively start sketching views. Only the most important of the info from the rather extensive tsv tables will likely fit in a quickly browsable "variantS" table; the rest will likely have to go to a separate "variant" view.