Closed lynnpais closed 1 year ago
@hanars The r000_test_case__wes__grch38__gcnv__v02__vcfv01__20230626
index seems sane after another glance, if you have a chance to take a look.
A few things I noticed:
"prev_call" : true,
on every variant, which is unexpected across the whole callset but does appear to be consistent with the raw bed file:cat CMG_20230610_annotated.seqr.bed | grep FLE_FAM176_GBMID00994_2_D1 | awk 'BEGIN {FS="\t"} {print $44}' | sort | uniq -c
128 FALSE
variantId
field has the date in it which means we're going to have different variantIds
from previous callsets. Is that correct?Did we properly port over the logic for is_new_joint_call
? This file is not a new joint call so it should be run without that argument, and so prev_overlap
and new_call
should all be False and prev_call
should be pulled from the is_latest
column
Yep! This is what we ported over. prev_call
is expected to be true
on every variant for the one sample that's been loaded.
true, but we expect to be loading some samples that have not been previously loaded. I guess this might be an artifact of the test project you chose, but can you confirm that there are some samples where is_latest
is True?
Yep, there definitely are:
bblanken@wmbbe-67c Desktop % cat CMG_20230610_annotated.seqr.bed | awk 'BEGIN {FS="\t"} {print $44}' | sort | uniq -c
2586522 FALSE
146808 TRUE
1 is_latest
Here's the list of samples with at least one is_latest
to be True
, if we'd like to pick a project to load.
Yes, lets test with R0476_cmg_wilkins_haug_exomes
@lynnpais heads up.
New index is created off of the updated callset: r0476_cmg_wilkins_haug_exomes__wes__grch38__gcnv__v02__vcfv01__20230816
. Looks like there's a mix of "prev_call": true
and "prev_call": false
as expected.
// Loaded with no issues
R000_test_case
R0211_guptill_exomes
R0276_inmr_neuromuscular_disea
R0294_myoseq_v20
R0315_coppens_exomes
R0317_cmg_topf_bonn_wes
R0330_cmg_laing_wes
R0344_cmg_beggs_exomes
R0351_cmg_bonnemann_exomes
R0358_cmg_vcgs_exomes
R0364_cmg_bodamer_manton_wes
R0365_cmg_scott_exomes
R0367_cmg_seidman_wes
R0369_cmg_ware_wes
R0375_cmg_topf_tubitak_wes
R0379_ogrady_wes
R0382_cmg_myoseq_exomes
R0383_gazda_exomes
R0394_cmg_kang_lgmd_wes
R0393_cmg_sankaran_exomes
R0401_cmg_scott_exomes_retrosp
R0404_cmg_estonia_wes
R0406_cmg_topf_panel_wes
R0408_cmg_perth_wes
R0433_cmg_hirschhorn_exomes
R0450_rare_genomes_project_exo
R0455_cmg_walsh_exomes_lispac
R0468_project_l
R0474_cmg_sherr_exomes
R0482_cmg_thaker_wes
R0483_cmg_pollak_wes
R0484_cmg_sinclair_wes
R0489_cmg_sweetser_wes
R0492_cmg_lerner_ellis_wes
R0495_cmg_topf_ea_cms_wes
R0497_cmg_muntoni_exomes
R0504_greka_wes
R0523_cmg_manton_doose_wes
R0524_cmg_wendy_chung_exomes
R0525_thaker_wes
R0528_cmg_engle_wes
R0529_cmg_southampton_wes
R0531_cmg_fajgenbaum
R0532_cmg_walsh_exomes_pmg
R0540_mgh_pathways_probands_on
R0560_mgrc_thaker_wes
R0563_mgrc_sherr_wes
R0566_tgg_ravenscroft_wes
R0576_amel_sims_40s_v2
R0592_tgg_thaker_geco_wes
R0624_neurodev_wave1_kenya
R0640_tgg_shimamura_sankaran_w
// Loaded with a few missing samples.
R0208_kang_v11
R0253_newcastle_v9
R0280_cmg_hildebrandt_exomes
R0284_pierce_retinal_degenerat
R0285_cmg_gleeson_exomes
R0303_cmg_manton_exomes
R0352_inmr_neuromuscular_disea
R0416_cmg_topf_cms_wes
R0449_pathways_mgh
R0451_cmg_walsh_exomes_microce
R0456_cmg_walsh_exomes_ch
R0478_cmg_walsh_exomes_id
R0496_cmg_fleming_wes
R0526_cmg_neurodev_wes
R0542_aicardi_wes
R0583_gregor_manton_wes
R0623_neurodev_wave1_sa
R0626_tgg_minikel_we
// Failed to load
R0234_mody_250s
R0244_dowling_v10
R0261_bonnemann_v9
R0381_manzini_exomes
R0548_bwh_pina_aguilar_dgap
R0639_tgg_bonnemann_turkey_wes
R0644_gregor_beggs_wes
R0692_gregor_muntoni_we
ALba shared the updated CMG gCNV call set - deposited in the CMG GREGoR Structural Variation Google Drive folder under the name gCNV_callset_20230610. Shared Alba's email with you for additional context and some changes she made to the most recent file.