Open avilella opened 1 year ago
The code behind the visualization is not released at this time, but generating the umap embeddings is easy.
The first step would be to create the NxD
matrix of per-protein embeddings for your N=200k proteins and D=1280 (average embeddings of esm-2 or in our case even esm-1b was used for no good reason).
Then using anndata and scanpy libraries you do something like
adata = AnnData(X)
adata.obs_names = mgnifyIDs
sc.pp.neighbors(adata, n_neighbors=15, use_rep='X')
sc.tl.umap(adata) # default args gave good results, experimented very little with other settings
assert 'X_umap' in adata.obsm
umap_df = adata.obsm.to_df() # look for columns index / X_umap1 / X_umap2
Hi! Any updates on the release of the visualization tools' code? Thanks in advance
Are the methods to plot the UMAP of the metagenomics dataset available?
I would like to generate a similar UMAP representation for about ~200,000 protein structures.
Any ideas where to start? Thx.