aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
165 stars 27 forks source link

ValueError: y should be a 1d array, got an array of shape (56842, 2) instead. #251

Closed alexlenail closed 8 months ago

alexlenail commented 8 months ago

I've followed the PBMC tutorial on my multi-ome data, and now am trying to perform the last step run_scenicplus which crashes with this error:

2023-10-26 23:37:42,920 SCENIC+_wrapper INFO     Inferring region to gene relationships
2023-10-26 23:37:44,569 R2G          INFO     Calculating region to gene importances, using GBM method
ray::_score_regions_to_single_gene_ray() (pid=237387, ip=172.31.9.168)
  File "/home/ec2-user/scenicplus/src/scenicplus/enhancer_to_gene.py", line 452, in _score_regions_to_single_gene_ray
    return _score_regions_to_single_gene(X, y, gene_name, region_names, regressor_type, regressor_kwargs)
  File "/home/ec2-user/scenicplus/src/scenicplus/enhancer_to_gene.py", line 469, in _score_regions_to_single_gene
    fitted_model = arboreto_core.fit_model(regressor_type=regressor_type,
  File "/home/ec2-user/miniforge3/envs/py39/lib/python3.9/site-packages/arboreto/core.py", line 143, in fit_model
    return do_sklearn_regression()
  File "/home/ec2-user/miniforge3/envs/py39/lib/python3.9/site-packages/arboreto/core.py", line 138, in do_sklearn_regression
    regressor.fit(tf_matrix, target_gene_expression)
  File "/home/ec2-user/miniforge3/envs/py39/lib/python3.9/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/ec2-user/miniforge3/envs/py39/lib/python3.9/site-packages/sklearn/ensemble/_gb.py", line 424, in fit
    y = column_or_1d(y, warn=True)
  File "/home/ec2-user/miniforge3/envs/py39/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1244, in column_or_1d
    raise ValueError(
ValueError: y should be a 1d array, got an array of shape (56842, 2) instead.
2023-10-27 04:23:49,897 R2G          INFO     Took 17165.327521324158 seconds
2023-10-27 04:23:49,897 R2G          INFO     Calculating region to gene correlation, using SR method
ray::_score_regions_to_single_gene_ray() (pid=368816, ip=172.31.9.168)
  File "/home/ec2-user/scenicplus/src/scenicplus/enhancer_to_gene.py", line 452, in _score_regions_to_single_gene_ray
    return _score_regions_to_single_gene(X, y, gene_name, region_names, regressor_type, regressor_kwargs)
  File "/home/ec2-user/scenicplus/src/scenicplus/enhancer_to_gene.py", line 488, in _score_regions_to_single_gene
    return pd.Series(correlation_coef, index=region_names), gene_name
  File "/home/ec2-user/miniforge3/envs/py39/lib/python3.9/site-packages/pandas/core/series.py", line 471, in __init__
    data = sanitize_array(data, index, dtype, copy)
  File "/home/ec2-user/miniforge3/envs/py39/lib/python3.9/site-packages/pandas/core/construction.py", line 648, in sanitize_array
    subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d)
  File "/home/ec2-user/miniforge3/envs/py39/lib/python3.9/site-packages/pandas/core/construction.py", line 699, in _sanitize_ndim
    raise ValueError("Data must be 1-dimensional")
ValueError: Data must be 1-dimensional
2023-10-27 05:52:02,189 R2G          INFO     Took 5292.290391921997 seconds
An error occured!
2023-10-27 05:52:02,514 SCENIC+_wrapper INFO     Inferring TF to gene relationships
2023-10-27 05:53:31,231 TF2G         INFO     Calculating TF to gene correlation, using GBM method
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (1/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (2/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (3/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (4/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (5/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (6/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (7/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (8/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (9/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
(_run_infer_partial_network_ray pid=489068) 2023-10-27 14:00:47,589 arboreto.core WARNING  'WARNING: infer_data failed for target T' Retry (10/10). Failure caused by ValueError('Cleaned TF matrix is empty, skipping inference of target T.').
2023-10-27 15:21:11,129 TF2G         INFO     Took 34059.89709830284 seconds
2023-10-27 15:21:11,130 TF2G         INFO     Adding correlation coefficients to adjacencies.
SeppeDeWinter commented 8 months ago

Hi @alexlenail

Can you check wether all your region/gene names are unique in your region accessibility/gene expression matrix?

All the best,

Seppe

alexlenail commented 8 months ago

Hi @SeppeDeWinter Great catch. Yes, there are a couple duplicated gene names. I removed them and am trying again.

alexlenail commented 8 months ago

That worked, but now I have a new issue: https://github.com/aertslab/scenicplus/issues/253