Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
152 stars 46 forks source link

Filtering and sorting in WTS module #4989

Open Jakob37 opened 4 hours ago

Jakob37 commented 4 hours ago

Is your feature request related to a problem in the current program to new available techology or software? Please describe and add links/citations if appropriate.

Our CLGs have started using Tomte and the WTS module in Scout to interpret RNA-seq samples. It seems to work well for them, but some additions could make it even more useful!

Describe the solution you'd like

In particular when having a list containing more than a handful outliers, it would help to be able to:

This is particularly useful if you load data with a more liberal cutoff, but still want to be able to focus on what is the most relevant.

A related question is - would it make sense to optionally also show the FDR? In omics-datasets I usually find FDR more informative than the p-value. (This is not something requested by the CLGs here, just an open question from me).

Additional context

A screenshot from the demo to have something to look at:

wts_screenshot

dnil commented 3 hours ago

I think we could manage that. Especially filter on type šŸ˜‰ Screenshot 2024-11-01 at 11 43 57.

We have not seen a need for these, as the variants within panels have so far been very few. But I suppose that is decided with cutoffs on the pipeline side. Sorting is always a bit contentious, but we have no rank here so why not! šŸŽ‰

As for FDR adjusted, I am in principle all for any kind of multiple testing correction, but not quite sure how to apply it here. šŸ˜Š I was impressed and slightly confused at Benjamini&Hochberg and have not really tried to read up on Benjamin&Yekutieli.
How do you think we should apply it? For all genes, for the default panel or or for all clinical panels?I suppose we will not be able to feed any hypothesis about candidate genes to the pipeline, which would otherwise have been cool. We do parse both padjust and p_adjust_gene so should be simple enough to show either, perhaps just instead of the unadjusted P-vaule. šŸ¤·

Jakob37 commented 3 hours ago

I think we could manage that. Especially filter on type šŸ˜‰

Aha, nice, didn't spot that šŸ˜…

We have not seen a need for these, as the variants within panels have so far been very few. But I suppose that is decided with cutoffs on the pipeline side. Sorting is always a bit contentious, but we have no rank here so why not! šŸŽ‰

Sounds good, thanks! Yes, when running with a more explorative loose cutoff we sometimes ended up with a larger bunch of hits, also within a panel. Among those, lower p-value tended to seem more "real" (as one would hope for). Having some false positives in the list was deemed OK here by the CLG as she anyway would verify things with her own eyes.

As for FDR adjusted, I am in principle all for any kind of multiple testing correction, but not quite sure how to apply it here. šŸ˜Š I was impressed and slightly confused at Benjamini&Hochberg and have not really tried to read up on Benjamin&Yekutieli. How do you think we should apply it? For all genes, for the default panel or or for all clinical panels?I suppose we will not be able to feed any hypothesis about candidate genes to the pipeline, which would otherwise have been cool. We do parse both padjust and p_adjust_gene so should be simple enough to show either, perhaps just instead of the unadjusted P-vaule. šŸ¤·

Hmm. My understanding is that the point with the BH correction is to go from p-values "how unlikely is it that this is a fluke, if only looking at this transcript" to for FDR adjusted "if looking across this dataset at FDR < 0.1, max 10% of what I am looking at should be noise". If the p-values are nicely distributed that is, which they seemed to be for OUTRIDER but not FRASER when I looked ...

Anyway, I was just thinking to allow the user check the padjust value sent through from FRASER and OUTRIDER, which seems to be the FDR-corrected p-value. Not sure if it is preferably to see instead of the regular P-value though, maybe Scout-users are more used to the meaning of regular p-values šŸ¤”