`.X`. error using sct.train.Trainer()

pandaqiuqiu commented 3 weeks ago

Hi, Qian,

I am using adata as input of the h5ad file converted from Seurat. Adding data from RNA as X, adding counts from RNA as raw, transferring meta.data to obs.

During model training, I encountered the following error at the step involving sct.train.Trainer(). Even after adding the step adata.X <- adata.raw.X, the issue persists. Can you help me solve this problem? Thanks so much!

Related codes ad follows: adata.X <Compressed Sparse Row sparse matrix of dtype 'float64' with 486880 stored elements and shape (5000, 1000)> adata.raw.X <Compressed Sparse Row sparse matrix of dtype 'float64' with 17799636 stored elements and shape (5000, 33694)>

sc.pp.calculate_qc_metrics(adata, percent_top=None, log1p=False, inplace=True) sc.pp.highly_variable_genes(adata, flavor='seurat_v3', n_top_genes=1000, subset=True)

/.../python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py:75: UserWarning: flavor='seurat_v3' expects raw count data, but non-integers were found. warnings.warn(

tnode = sct.train.Trainer(adata, loss_mode='nb', alpha_recon_lec=0.5, alpha_recon_lode=0.5) tnode.train()

ValueError Traceback (most recent call last) Cell In[36], line 1 ----> 1 tnode = sct.train.Trainer(adata, loss_mode='nb', alpha_recon_lec=0.5, alpha_recon_lode=0.5) 2 tnode.train()

File /.../python3.10/site-packages/sctour/train.py:168, in Trainer.init(self, adata, percent, n_latent, n_ode_hidden, n_vae_hidden, batch_norm, ode_method, step_size, alpha_recon_lec, alpha_recon_lode, alpha_kl, loss_mode, nepoch, batch_size, drop_last, lr, wt_decay, eps, random_state, val_frac, use_gpu) 166 X = self.adata.X.data if sparse.issparse(self.adata.X) else self.adata.X 167 if (X.min() < 0) or np.any(~np.equal(np.mod(X, 1), 0)): --> 168 raise ValueError( 169 f"Invalid expression matrix in .X. {self.loss_mode} mode expects raw UMI counts in .X of the AnnData." 170 ) 172 self.n_cells = adata.n_obs 173 self.batch_size = batch_size

ValueError: Invalid expression matrix in .X. nb mode expects raw UMI counts in .X of the AnnData.

LiQian-XC commented 3 weeks ago

Hi, Please run adata = adata.raw.to_adata() to get raw counts in adata.X before all steps as both the sc.pp.highly_variable_genes (when using flavor='seurat_v3') and scTour under default mode expect raw UMI counts. Please let me know if you have any further questions.

pandaqiuqiu commented 3 weeks ago

@LiQian-XC

Thanks for your fast response. When running run adata = adata.raw.to_adata() at the beginning, it encounters a new error as follows:

tnode = sct.train.Trainer(adata, loss_mode='nb', alpha_recon_lec=0.5, alpha_recon_lode=0.5) tnode.train()

Running using CPU.

AttributeError Traceback (most recent call last) Cell In[16], line 2 1 tnode = sct.train.Trainer(adata, loss_mode='nb', alpha_recon_lec=0.5, alpha_recon_lode=0.5) ----> 2 tnode.train()

File ~/.../sctour/lib/python3.10/site-packages/sctour/train.py:258, in Trainer.train(self) 254 def train(self): 255 """ 256 Model training. 257 """ --> 258 self._get_data_loaders() 260 params = filter(lambda p: p.requires_grad, self.model.parameters()) 261 self.optimizer = torch.optim.Adam(params, lr = self.lr, weight_decay = self.wt_decay, eps = self.eps)

File ~/.../sctour/lib/python3.10/site-packages/sctour/train.py:245, in Trainer._get_data_loaders(self) 240 """ 241 Generate Data Loaders for training and validation datasets. 242 """ 244 train_data, val_data = split_data(self.adata, self.percent, self.val_frac) --> 245 self.train_dataset = MakeDataset(train_data, self.loss_mode) 246 self.val_dataset = MakeDataset(val_data, self.loss_mode) 248 # sampler = BatchSampler(train_data.n_obs, self.batch_size, self.drop_last) 249 # self.train_dl = DataLoader(self.train_dataset, batch_sampler = sampler)

File ~/miniconda3/envs/sctour/lib/python3.10/site-packages/sctour/data.py:99, in MakeDataset.init(self, adata, loss_mode) 97 X = np.log1p(X) 98 if sparse.issparse(X): ---> 99 X = X.A 100 self.data = torch.tensor(X) 101 self.library_size = self.data.sum(-1)

AttributeError: 'SparseCSRView' object has no attribute 'A'

LiQian-XC commented 3 weeks ago

Hi,

Can you try the following steps to see whether it works?

from scipy.sparse import csr_matrix adata.X = csr_matrix(adata.X)

Please let me know if you have any other questions.

pandaqiuqiu commented 3 weeks ago

@LiQian-XC After running from scipy.sparse import csr_matrix adata.X = csr_matrix(adata.X), the same issue persisted. However, I tried using adata.X = adata.X.toarray() later, and it solved the problem. Thank you for your prompt response.

bbimber commented 2 weeks ago

@LiQian-XC : we are having what I assume is an analogous issue. I was the

AttributeError: 'SparseCSRView' object has no attribute 'X' error just like above, when calling train(). The csr_matrix() solution did not work. FWIW, our code is subsetting the adata object right before training, and I suspect this is converting into this this View class:

adataObj = adataObj[:, list(set(adataObj.var_names) - set(exclusionList))]
tnode = sct.train.Trainer(adataObj)

My guess is that something about sutsetting is converting the AnnData object into a view of the data, and that isnt interacting well with scTour. Do you have any debugging suggestions or tests on the anndata object to verify that theory?

LiQian-XC commented 2 weeks ago

Hi,

Can you try to copy the data when subsetting (please see your example below)?

adataObj = adataObj[:, list(set(adataObj.var_names) - set(exclusionList))].copy()

I think this may address this issue and please let me know if it does not work.

bbimber commented 2 weeks ago

Thanks for the idea. Yes, after posting I came to the same conclusion. Tests are running on the code here: https://github.com/bimberlabinternal/CellMembrane/blob/407adf4f1d998af41c1de79f257e94bfe256d0ee/inst/scripts/run_sctour.py#L31

If this is a solution, would you consider adding this kind of test directly to scTour?

LiQian-XC commented 2 weeks ago

Thanks for sharing your code. I will consider adding this in a new version of scTour.

LiQian-XC / sctour

`.X`. error using sct.train.Trainer() #10

Running using CPU.