Closed KangchengHou closed 1 year ago
From these results, beagle and eagle2 genetic map are quite similar. We can just leave eagle2 genetic map (as eagle2 genetic map also provides local COMBINED_rate.
Use following code to reproduce the figure.
from scipy import stats
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# download beagle
!wget https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/plink.GRCh38.map.zip
!unzip plink.GRCh38.map.zip
# download eagle map
!wget https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/tables/genetic_map_hg38_withX.txt.gz
eagle_map = pd.read_csv("genetic_map_hg38_withX.txt.gz", delim_whitespace=True)
nrows, ncols = 4, 6
fig, axes = plt.subplots(
figsize=(3 * ncols, 3 * nrows), dpi=150, nrows=nrows, ncols=ncols
)
for chrom in range(1, 23):
ax = axes.flatten()[chrom - 1]
chrom_beagle_map = pd.read_csv(
f"plink.chr{chrom}.GRCh38.map",
delim_whitespace=True,
header=None,
)
chrom_eagle_map = eagle_map[eagle_map["chr"] == chrom]
ax.scatter(
chrom_beagle_map[3], chrom_beagle_map[2], s=0.5, alpha=0.01, label="Beagle"
)
ax.scatter(
chrom_eagle_map["position"],
chrom_eagle_map["Genetic_Map(cM)"],
s=0.5,
alpha=0.01,
label="Eagle",
)
ax.set_xlabel("Physical position")
ax.set_ylabel("Genetic position")
ax.set_title(f"Chromosome {chrom}")
legend = ax.legend()
for handle in legend.legendHandles:
handle.set_sizes([6.0])
handle.set_alpha(1.0)
fig.tight_layout()
plt.show()
currently we use beagle genetic map for rfmix, eagle2 genetic map for hapgen. Ideal to leave only 1. TODO: check consistency between these 2 types of genetic map. Have a x-y plot where we put genetic position vs. physical position and compare across 2 data source.