eiriniar / CellCnn

Representation Learning for detection of phenotype-associated cell subsets
http://www.imsb.ethz.ch/research/claassen/Software/cellcnn.html
GNU General Public License v3.0
65 stars 28 forks source link

ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). #7

Closed shawjes closed 3 years ago

shawjes commented 3 years ago

Hi, I'm writing from the University of Colorado where we're hoping to use CellCnn to understand immune dysregulation in individuals with Down syndrome (http://www.trisome.org/). I've been able to successfully run CellCnn on a small subset of samples (8 with Trisomy 21, 8 without), with each FCS file subsampled down to only 1000 events. I'm encountering the following error when I attempt to run the analysis on our full cohort (n=292 with Trisomy 21, n=96 without Trisomy 21). Our panel includes 34 markers relevant among CD45+/CD66lo cells. Can you help me understand what could trigger this NaN/Infinity error message, or recommend some things I should try to fix it? Thanks so much.

(CellCnn) bash-3.2$ python /usr/local/bin/CellCnn/cellCnn/run_analysis.py \
> --seed 1234 \
> -f '/Users/shawjes/Dropbox/EspinosaGroup/ANALYSIS/CyTOF/P4C/Unsupervised_Analysis/CellCNN/P4C_CellCNN_InputFiles/P4C_CyTOF_051121_Samples_with_Labels_for_CellCnn_CD45posCD66lo_Subsample45k_v0.1_JRS.csv' \
> -m '/Users/shawjes/Dropbox/EspinosaGroup/ANALYSIS/CyTOF/P4C/Unsupervised_Analysis/CellCNN/P4C_CellCNN_InputFiles/P4C_CyTOF_051121_Markers_for_CellCnn_among_CD45posCD66lo_Subsample45k_v0.1_JRS.csv' \
> -i '/Users/shawjes/Dropbox/EspinosaGroup/P4C_CyTOF/CellCNN/CyTOF_P4C_P95batch_normalized_FSC_files (PA gates modified flowJo-New Bcells gate) - Gated Populations_CD45+CD66lo/Subsample45k/' \
> -o '/Users/shawjes/Dropbox/EspinosaGroup/ANALYSIS/CyTOF/P4C/Unsupervised_Analysis/CellCNN/051121_Out_AllSamples_CD45posCD66lo_Subsample45kEvents_noarcsinh' \
> --no_arcsinh \
> --export_csv \
> --group_a D21 --group_b T21 \
> --export_csv \
> --stat_test mannwhitneyu \
> --verbose 0
Traceback (most recent call last):
  File "/usr/local/bin/CellCnn/cellCnn/run_analysis.py", line 242, in <module>
    main()
  File "/usr/local/bin/CellCnn/cellCnn/run_analysis.py", line 149, in main
    train, val = next(skf.split(np.zeros((len(phenotypes), 1)), phenotypes))
  File "/Users/shawjes/.local/share/virtualenvs/CellCnn-gWcf5gBq/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 735, in split
    y = check_array(y, ensure_2d=False, dtype=None)
  File "/Users/shawjes/.local/share/virtualenvs/CellCnn-gWcf5gBq/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/Users/shawjes/.local/share/virtualenvs/CellCnn-gWcf5gBq/lib/python3.7/site-packages/sklearn/utils/validation.py", line 646, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/Users/shawjes/.local/share/virtualenvs/CellCnn-gWcf5gBq/lib/python3.7/site-packages/sklearn/utils/validation.py", line 100, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').