KevinMenden / scaden

Deep Learning based cell composition analysis with Scaden.
https://scaden.readthedocs.io
MIT License
71 stars 25 forks source link

Scaden simulate ValueError and choosing input parameters #124

Open elise-smith opened 1 year ago

elise-smith commented 1 year ago

Hi Kevin,

Thank you for the great package.

I am trying to run scaden simulate on a .h5ad object with ~21,000 cells and 25 cell types. I previously ran this on another .h5ad object successfullly.

I am using the following command: scaden simulate --out /data/Deconvolution/Scaden/Output/ --cells 200 --n_samples 1000 --data /data/Deconvolution/Scaden/Input/ --data-format h5ad --pattern *.h5ad

However, I receive the following error:

INFO     Datasets: ['data']                            bulk_simulator.py:84
INFO     Simulating data from data                     bulk_simulator.py:89
INFO     Loading data dataset ...                     bulk_simulator.py:141
INFO     Merging unknown cell types: ['unknown']           bulk_simulator.py:107
INFO     Subsampling data ...                         bulk_simulator.py:110
Traceback (most recent call last):
  File "/data/anaconda/envs/scaden/bin/scaden", line 8, in <module>
    sys.exit(main())
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/scaden/__main__.py", line 48, in main
    cli()
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/scaden/__main__.py", line 207, in simulate
    simulation(
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/scaden/simulate.py", line 22, in simulation
    bulk_simulator.simulate()
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/scaden/simulation/bulk_simulator.py", line 90, in simulate
    self.simulate_dataset(dataset)
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/scaden/simulation/bulk_simulator.py", line 114, in simulate_dataset
    tmp_x, tmp_y = self.create_subsample_dataset(
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/scaden/simulation/bulk_simulator.py", line 253, in create_subsample_dataset
    sample, label = self.create_subsample(x, y, celltypes)
  File "/data/anaconda/envs/scaden/lib/python3.8/site-packages/scaden/simulation/bulk_simulator.py", line 305, in create_subsample
    cells_fraction = np.random.randint(0, cells_sub.shape[0], samp_fracs[i])
  File "mtrand.pyx", line 748, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

My matrix of the input .h5ad looks like this: adata[0:5,0:5].X.todense() [0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0.9539254, 1.9078507, 0. ], [0. , 0. , 0. , 4.070004 , 0. ]

Please could you let me know if you know how I might be able to fix this.

Additionally, do you have any advice on how to select the --cells and --n_samples parameters or can these generally be kept as the default values?

Many thanks, Elise