Closed ZFF00 closed 1 month ago
Hello,
As shown in Fig. 1C, UMAP can be used to project k-mer sequences along with their methylation levels. To achieve this, the k-mer sequences should first be transformed into vector representations by encoding each nucleotide as an integer (A/0, C/1, G/2, T/3). This results in a matrix, X
, where each row corresponds to a k-mer sequence, and the columns represent the integer-encoded k-mer sequences combined with their associated methylation levels. This matrix can then be projected to a 2D space using UMAP. The presence of clusters in the projections may indicate potential sequence preferences.
Below is the python code used for generating the Fig. 1C, with X
being the matrix describe above:
import numpy as np
import matplotlib.pyplot as plt
import umap
# Data normalization.
Xcen = X - np.mean(X, axis=0)
Xnorm = Xcen / np.std(Xcen, axis=0)
meth_levels = X[:,-1] # The last column contains the methylation level of each k-mer sequence
# Project to a 2D space
reducer = umap.UMAP()
projections = reducer.fit_transform(Xnorm)
# Visualization
fig, axs = plt.subplots(figsize=(3,3))
axs.scatter(
projections[:,0],
projections[:,1],
s=.1,
edgecolor='none',
c=meth_levels,
cmap=plt.get_cmap('magma'),
rasterized=True
)
Best,
Dear author,
Thank you very much!