KrishnaswamyLab / MAGIC

MAGIC (Markov Affinity-based Graph Imputation of Cells), is a method for imputing missing values restoring structure of large biological datasets.
GNU General Public License v2.0
341 stars 97 forks source link

MAGIC on RNA + ATAC data #210

Open yamajackr opened 2 years ago

yamajackr commented 2 years ago

Hi @scottgigante Thank you for the great tool. I want to impute data of 10X genomics scMultiome dataset. Applying MAGIC on ATAC-seq has been benchmarked here. https://doi.org/10.1093/bib/bbab442 I'm considering applying the distance matrix from the weighted nearest neighbour distance in Seurat to MAGIC. Is it reasonable?

scottgigante commented 2 years ago

@yamamotoryo this is a totally valid use case, yes. You can pass the distance matrix to a graphtools.Graph with precomputed='distance', and then pass this graph to MAGIC.fit(X, graph=graph).

yamajackr commented 2 years ago

@scottgigante Thank you so much! I will try it!

yamajackr commented 2 years ago

Hi @scottgigante, I tried that code. I used the affinity matrix and generated a graph. Screenshot 2022-11-02 at 3 48 17 PM

I could make a magic operator using the graph. But transformation failed.

Screenshot 2022-11-02 at 3 53 16 PM

affi =  pd.read_csv('affinity_mat.tsv',header=0, sep='\t', index_col=0)
data = affi.to_numpy()
graph = graphtools.Graph(data, precomputed='affinity')
magic_op_g = magic.MAGIC()
magic_op_g = magic_op_g.fit(X=X,  graph=graph)
X_magic = magic_op_g.transform()

Error

magic_op_g = magic_op_g.fit(X=df, graph=graph) Running MAGIC on 1729 cells and 21470 genes. Using precomputed graph and diffusion operator...

X_magic = magic_op_g.transform() Calculating imputation... Calculated imputation in 0.26 seconds. Traceback (most recent call last):

File "/var/folders/59/cxr2yt4926jc95n5w2mtz32r0000gn/T/ipykernel_34361/2648698257.py", line 1, in X_magic = magic_op_g.transform()

File "/Users/jack/opt/anaconda3/envs/py310/lib/python3.10/site-packages/magic/magic.py", line 607, in transform X_magic = utils.convert_to_same_format(

File "/Users/jack/opt/anaconda3/envs/py310/lib/python3.10/site-packages/magic/utils.py", line 167, in convert_to_same_format data.columns = target_columns

File "/Users/jack/opt/anaconda3/envs/py310/lib/python3.10/site-packages/pandas/core/generic.py", line 5588, in setattr return object.setattr(self, name, value)

File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.set

File "/Users/jack/opt/anaconda3/envs/py310/lib/python3.10/site-packages/pandas/core/generic.py", line 769, in _set_axis self._mgr.set_axis(axis, labels)

File "/Users/jack/opt/anaconda3/envs/py310/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 214, in set_axis self._validate_set_axis(axis, new_labels)

File "/Users/jack/opt/anaconda3/envs/py310/lib/python3.10/site-packages/pandas/core/internals/base.py", line 69, in _validate_set_axis raise ValueError(

ValueError: Length mismatch: Expected axis has 1729 elements, new values have 21470 elements

The default function without a graph worked.

magic_op = magic.MAGIC()
magic_op = magic_op.fit_transform(X=X)

Any help would be appreciated.

Thank you, Ryosuke

scottgigante commented 2 years ago

Looks like a bug, but you can work around it with magic_op_g = magic_op_g.fit(X=df.to_numpy(), graph=graph)

yamajackr commented 2 years ago

Thanks, @scottgigante . Your code worked, but the shape of obtained array was same as that of graph. (Gene number was 21,000; cell number was 1,729; obtained array was 1,729 x 1.729)

Instead, I tried this.

magic_op_g = magic.MAGIC()
magic_op_g = magic_op_g.fit(X=df,  graph=graph)
diff_op_3 = np.linalg.matrix_power(magic_op_g.diff_op, 3) # t = 3
data_new = np.array(np.dot(diff_op_3, df)) 
df_new = pd.DataFrame(data=data_new, columns=df.columns.tolist())

I think it works.