broadinstitute / lrma-aou1-panel-creation

Pipelines and evaluations covering integration, phasing, and imputation of short and structural variants for the AoU Phase 1 long-reads callset.
1 stars 0 forks source link

Add code for bipartite-graph checking. #8

Open samuelklee opened 1 month ago

samuelklee commented 1 month ago

To support #1. Not sure if this is a notebook/script or a WDL, but happy to take a look once it's merged in any form and WDLize it, if needed. I will also take responsibility for integrating it into the megaWDL once it is added. Might be nice to see an example run and an explanation of the inputs/outputs; please feel free to start a conversation in the Discussions section of the repo.

samuelklee commented 1 month ago

This can just be a script that operates on a single sample and outputs machine-readable metrics, say in a tabular or json file; perhaps any per-sample plots that might be useful as well.

Please use a self-documenting schema that allows a downstream script to easily gather files and summarize metrics across samples. E.g., per-sample tables should include the sample name in one of the columns.

I will take this single-sample script, WDLize it, and add code/task for the downstream multisample summarizing.

samuelklee commented 1 month ago

@fabio-cunial any updates or plans you'd like to record here?

fabio-cunial commented 1 month ago

Bipartite graph checking is useful only for unphased genotypes (e.g. it's a way to compare SV genotypers like snifflesGT, cutesvGT, and svjedi, which output only unphased genotypes: one would like to know how far from phased those assignments are).

Is this still something useful to add, or has it been superseded by the collision removal tool?

samuelklee commented 3 weeks ago

Sorry for the delayed response---sure, we can deprioritize for now. Let's leave this open in case we expand upstream.