Closed SamuelAMiller1 closed 3 years ago
Hey @SamuelAMiller1, I am currently unable to reproduce it with the example that I paste below. Could there be an issue with the .obs_names
not being unique?
This is the code I've written to reproduce the issue:
import muon as mu
import mofax as mfx
### Create a MuData object to work with ###
import jax
import numpy as np
import pandas as pd
n = 1000
aj, bj = 100, 200
a = mu.AnnData(np.array(jax.random.normal(key=jax.random.PRNGKey(1), shape=(n, aj))))
b = mu.AnnData(np.array(jax.random.normal(key=jax.random.PRNGKey(2), shape=(n, bj))))
a.var_names = [f"var_{j+1}" for j in range(aj)]
b.var_names = [f"var_{j+1}" for j in range(bj)]
a.obs['Batch'] = np.array(jax.random.bernoulli(key=jax.random.PRNGKey(3), shape=(n,)).astype(int))
mdata = mu.MuData({"a": a, "b": b})
### Copy the metadata column as in the issue description ###
mdata.obs['Batch'] = a.obs['Batch'].copy() # also see a note below
### Run MOFA ###
mu.tl.mofa(mdata, outfile="issue3.hdf5")
### Inspect the model ###
model = mfx.mofa_model("issue3.hdf5")
model.metadata.head()
# group Batch a:Batch
# sample
# 0 group1 1 1
# 1 group1 0 0
# 2 group1 0 0
# 3 group1 1 1
# 4 group1 1 1
model.close()
Is there anything I could add to this code to match your case closer?
While it is probably mildly relevant to the issue, you should be able to use the column for the batches from the modality without manually copying it to the mdata.obs
. If the mdata
object was created before mdata.mod['rna']
was modified, you might need to run mdata.update()
to sync the content of the mdata
with the modalities enclosed. This would actually copy this column for you while taking care of possible missing or extra samples in each modality. Then you should be able to just write mu.tl.mofa(mdata, groups_label = 'rna:Batch', ...)
.
Also, I am not sure this is the latest master
of mofax
, could you try that version as well? Not sure it is going to help with this particular issue but we've improved handling of quite a few cases with the most recent version.
Should be installable with the pip
in your conda
environment:
pip install git+git://github.com/bioFAM/mofax@master
After further digging, the issue was related to reading in a model where the groups_label had been specified. Per your suggestion, using the latest master seems to have resolved the issue. Thanks!
I am trying to specify the batches while running MOFA:
mu.tl.mofa(mdata, groups_label = 'Batch', outfile = '/models/batch_seq.hdf5', n_factors=30)
Copying the Batch column from the rna modality (same in both modalities):
mdata.obs['Batch'] = mdata.mod['rna'].obs['Batch'].copy()
Doing so results in this error while trying to read in the model:
If I do not copy obs from a modality into the mdata obs, I am able to read in the model, but in this case I am then unable to specify a groups_label.