broadinstitute / seqr-loading-pipelines

hail-based pipelines for annotating variant callsets and exporting them to elasticsearch
MIT License
22 stars 20 forks source link

Load GREGoR WES gCNV callset - Round 4 #505

Closed lynnpais closed 1 year ago

lynnpais commented 1 year ago

ALba shared the updated CMG gCNV call set - deposited in the CMG GREGoR Structural Variation Google Drive folder under the name gCNV_callset_20230610. Shared Alba's email with you for additional context and some changes she made to the most recent file.

bpblanken commented 1 year ago

@hanars The r000_test_case__wes__grch38__gcnv__v02__vcfv01__20230626 index seems sane after another glance, if you have a chance to take a look.

A few things I noticed:

cat CMG_20230610_annotated.seqr.bed | grep FLE_FAM176_GBMID00994_2_D1 | awk 'BEGIN {FS="\t"} {print $44}' | sort | uniq -c
 128 FALSE
hanars commented 1 year ago

Did we properly port over the logic for is_new_joint_call? This file is not a new joint call so it should be run without that argument, and so prev_overlap and new_call should all be False and prev_call should be pulled from the is_latest column

bpblanken commented 1 year ago

Yep! This is what we ported over. prev_call is expected to be true on every variant for the one sample that's been loaded.

hanars commented 1 year ago

true, but we expect to be loading some samples that have not been previously loaded. I guess this might be an artifact of the test project you chose, but can you confirm that there are some samples where is_latest is True?

bpblanken commented 1 year ago

Yep, there definitely are:

bblanken@wmbbe-67c Desktop % cat CMG_20230610_annotated.seqr.bed | awk 'BEGIN {FS="\t"} {print $44}' | sort | uniq -c
2586522 FALSE
146808 TRUE
   1 is_latest

Here's the list of samples with at least one is_latest to be True, if we'd like to pick a project to load.

hanars commented 1 year ago

Yes, lets test with R0476_cmg_wilkins_haug_exomes

larrybabb commented 1 year ago

@lynnpais heads up.

bpblanken commented 1 year ago

New index is created off of the updated callset: r0476_cmg_wilkins_haug_exomes__wes__grch38__gcnv__v02__vcfv01__20230816. Looks like there's a mix of "prev_call": true and "prev_call": false as expected.

bpblanken commented 1 year ago
// Loaded with no issues
R000_test_case
R0211_guptill_exomes
R0276_inmr_neuromuscular_disea
R0294_myoseq_v20
R0315_coppens_exomes
R0317_cmg_topf_bonn_wes
R0330_cmg_laing_wes
R0344_cmg_beggs_exomes
R0351_cmg_bonnemann_exomes
R0358_cmg_vcgs_exomes
R0364_cmg_bodamer_manton_wes
R0365_cmg_scott_exomes
R0367_cmg_seidman_wes
R0369_cmg_ware_wes
R0375_cmg_topf_tubitak_wes
R0379_ogrady_wes
R0382_cmg_myoseq_exomes
R0383_gazda_exomes
R0394_cmg_kang_lgmd_wes
R0393_cmg_sankaran_exomes
R0401_cmg_scott_exomes_retrosp
R0404_cmg_estonia_wes
R0406_cmg_topf_panel_wes
R0408_cmg_perth_wes
R0433_cmg_hirschhorn_exomes
R0450_rare_genomes_project_exo
R0455_cmg_walsh_exomes_lispac
R0468_project_l
R0474_cmg_sherr_exomes
R0482_cmg_thaker_wes
R0483_cmg_pollak_wes
R0484_cmg_sinclair_wes
R0489_cmg_sweetser_wes
R0492_cmg_lerner_ellis_wes
R0495_cmg_topf_ea_cms_wes
R0497_cmg_muntoni_exomes
R0504_greka_wes
R0523_cmg_manton_doose_wes
R0524_cmg_wendy_chung_exomes
R0525_thaker_wes
R0528_cmg_engle_wes
R0529_cmg_southampton_wes
R0531_cmg_fajgenbaum
R0532_cmg_walsh_exomes_pmg
R0540_mgh_pathways_probands_on
R0560_mgrc_thaker_wes
R0563_mgrc_sherr_wes
R0566_tgg_ravenscroft_wes
R0576_amel_sims_40s_v2
R0592_tgg_thaker_geco_wes
R0624_neurodev_wave1_kenya
R0640_tgg_shimamura_sankaran_w

// Loaded with a few missing samples.
R0208_kang_v11
R0253_newcastle_v9
R0280_cmg_hildebrandt_exomes
R0284_pierce_retinal_degenerat
R0285_cmg_gleeson_exomes
R0303_cmg_manton_exomes
R0352_inmr_neuromuscular_disea
R0416_cmg_topf_cms_wes
R0449_pathways_mgh
R0451_cmg_walsh_exomes_microce
R0456_cmg_walsh_exomes_ch
R0478_cmg_walsh_exomes_id
R0496_cmg_fleming_wes
R0526_cmg_neurodev_wes
R0542_aicardi_wes
R0583_gregor_manton_wes
R0623_neurodev_wave1_sa
R0626_tgg_minikel_we

// Failed to load
R0234_mody_250s
R0244_dowling_v10
R0261_bonnemann_v9
R0381_manzini_exomes
R0548_bwh_pina_aguilar_dgap
R0639_tgg_bonnemann_turkey_wes
R0644_gregor_beggs_wes
R0692_gregor_muntoni_we