AttributeError: flatten not found

auesro commented 2 years ago

Dear UNIFAN team,

I have come across an error when analyzing my own dataset (23925 cells, 4000 variable genes) when using the example notebook provided (no issues at all when running the example dataset) in the cell 10:

Traceback (most recent call last):

  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "/home/auesro/Desktop/Cell_Ranger/1_Merge_SCRAN_Pool/UNIFAN.py", line 328, in <module>
    trainer.train(alpha=alpha, beta=beta, beta_list=beta_list, gene_covered_matrix=gene_covered_matrix)

  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/unifan/trainer.py", line 172, in train
    self.train_functions[self.model_name](**kwargs)

  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/unifan/trainer.py", line 244, in train_epoch_r
    for batch_idx, (X_batch) in enumerate(tqdm(self.dataloader_train)):

  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:

  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()

  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)

  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()

  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
    raise exception

AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 363, in __getitem__
    return self.dataset[self.indices[idx]]
  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/unifan/datasets.py", line 51, in __getitem__
    main = self.data[idx].X.flatten()
  File "/home/auesro/mambaforge/envs/Unifan/lib/python3.7/site-packages/scipy/sparse/base.py", line 687, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: flatten not found

Something I noticed is that exp_variable_genes is an Array of float32 when using the test dataset while with my data is a sparse.csr.csr_matrix...

Any ideas?

Thanks!

doraadong commented 2 years ago

Hi, we assume the expression data (data.X) as a numpy array. It seems your expression data (data.X) is a sparse CSR matrix which does not support flatten(). I guess you did not preprocess the expression and just use the raw count data (so it is saved as a sparse matrix)? If you follow the data preprocessing procedures as given in getExample.py (including filtering, normalization, log transformation and scaling), you should end up with a numpy array saved as data.X.

auesro commented 2 years ago

Thanks, solved it following a modified version of your script.

doraadong commented 2 years ago

Thanks for the feedback. We would recommend you also trying to use preprocessed data. You may follow similar preprocessing steps (i.e. filtering, normalization, log transformation and scaling) as we show in the getExample.py script.

doraadong / UNIFAN

AttributeError: flatten not found #3