KevinMenden / scaden

Deep Learning based cell composition analysis with Scaden.
https://scaden.readthedocs.io
MIT License
71 stars 26 forks source link

ValueError: Input contains infinity or a value too large for dtype('float32'). #118

Open chenjy327 opened 2 years ago

chenjy327 commented 2 years ago

Hi Kevin

There was a bug when I tried to use 'scaden process':

INFO Scaling using log_min_max functions.py:65 Traceback (most recent call last): File "/data/software/Python-3.8.5/bin/scaden", line 8, in sys.exit(main()) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/scaden/main.py", line 48, in main cli() File "/data/software/Python-3.8.5/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/scaden/main.py", line 155, in process processing( File "/data/software/Python-3.8.5/lib/python3.8/site-packages/scaden/process.py", line 35, in processing preprocess_h5ad_data(raw_input_path=training_data, File "/data/software/Python-3.8.5/lib/python3.8/site-packages/scaden/model/functions.py", line 67, in preprocess_h5ad_data raw_input.X = sample_scaling(raw_input.X, scaling_option) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/scaden/model/functions.py", line 42, in sample_scaling x = mms.fit_transform(x.T).T File "/data/software/Python-3.8.5/lib/python3.8/site-packages/sklearn/base.py", line 699, in fit_transform return self.fit(X, fit_params).transform(X) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 363, in fit return self.partial_fit(X, y) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 396, in partial_fit X = self._validate_data(X, reset=first_pass, File "/data/software/Python-3.8.5/lib/python3.8/site-packages/sklearn/base.py", line 421, in _validate_data X = check_array(X, *check_params) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f return f(args, **kwargs) File "/data/software/Python-3.8.5/lib/python3.8/site-packages/sklearn/utils/validation.py", line 663, in check_array _assert_all_finite(array, File "/data/software/Python-3.8.5/lib/python3.8/site-packages/sklearn/utils/validation.py", line 103, in _assert_all_finite raise ValueError( ValueError: Input contains infinity or a value too large for dtype('float32').

After that I checked the simulated data (data.h5ad) and found the 'inf' value in the simulated data. 微信图片_20211210105547

May I have your suggestion on this issue? Thanks a lot!

Jianyu

KevinMenden commented 2 years ago

Hi @chenjy327 ,

are you using the latest version of Scaden? This sounds like a bug that was already solved. If you're using the latest version, what steps have you been following? Is this reproducible when simulating again?

Sorry for the late reply by the way! Cheers, Kevin

jadonWong commented 1 year ago

HI Kevin, I have the same problem,the version is scaden, version 1.1.1 steps as follow:

  1. generate three data set "sclc_counts.txt, sclc_bulk_data.txt, sclc_celltypes.txt" from raw counts matrix; didn't apply any transform or normalize, just intersect genes between bulk and scRNA;
  2. run scaden simulate to generate data.h5ad;
  3. run scaden process , but i got this problem?

any suggestion? thank you very much!

jadonWong commented 1 year ago

HI Kevin, I have the same problem,the version is scaden, version 1.1.1 steps as follow:

  1. generate three data set "sclc_counts.txt, sclc_bulk_data.txt, sclc_celltypes.txt" from raw counts matrix; didn't apply any transform or normalize, just intersect genes between bulk and scRNA;
  2. run scaden simulate to generate data.h5ad;
  3. run scaden process , but i got this problem?

any suggestion? thank you very much!

I solved this problem by update scaden to version 1.1.2 !!