Bug while running calculate_regions_to_genes_relationships

CYorick commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce

if 'region_to_gene' not in scplus_obj.uns.keys():
    log.info('Inferring region to gene relationships')
    calculate_regions_to_genes_relationships(scplus_obj, 
                    ray_n_cpu = None, 
                    _temp_dir = _temp_dir,
                    importance_scoring_method = 'GBM'
                )
    log.info('Saving object')
    save_path = os.path.join(work_dir, 'scenicplus')

Error output Paste the entire output of the command, including log information prior to the error.

  def dna_to_twobit(dna: str) -> int:
/Users/cyz/Library/r-miniconda/envs/scenicplus/lib/python3.8/site-packages/loompy/bus_file.py:102: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def twobit_1hamming(twobit: int, size: int) -> List[int]:
2023-11-01 20:24:45,938 TF2G         INFO     Inferring region to gene relationships
2023-11-01 20:24:45,989 R2G          INFO     Calculating region to gene importances, using GBM method
Running using a single core:   6%|▌         | 876/15226 [02:42<44:24,  5.39it/s]
Traceback (most recent call last):
  File "/Users/cyz/U/single cell.py", line 33, in <module>
    log.info('Inferring region to gene relationships')
  File "/Users/cyz/scenicplus/src/scenicplus/enhancer_to_gene.py", line 654, in calculate_regions_to_genes_relationships
    region_to_gene_importances = _score_regions_to_genes(SCENICPLUS_obj,
  File "/Users/cyz/scenicplus/src/scenicplus/enhancer_to_gene.py", line 585, in _score_regions_to_genes
    regions_to_genes[gene], _ = _score_regions_to_single_gene(X=acc,
  File "/Users/cyz/scenicplus/src/scenicplus/enhancer_to_gene.py", line 469, in _score_regions_to_single_gene
    fitted_model = arboreto_core.fit_model(regressor_type=regressor_type,
  File "/Users/cyz/Library/r-miniconda/envs/scenicplus/lib/python3.8/site-packages/arboreto/core.py", line 143, in fit_model
    return do_sklearn_regression()
  File "/Users/cyz/Library/r-miniconda/envs/scenicplus/lib/python3.8/site-packages/arboreto/core.py", line 138, in do_sklearn_regression
    regressor.fit(tf_matrix, target_gene_expression)
  File "/Users/cyz/Library/r-miniconda/envs/scenicplus/lib/python3.8/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/Users/cyz/Library/r-miniconda/envs/scenicplus/lib/python3.8/site-packages/sklearn/ensemble/_gb.py", line 424, in fit
    y = column_or_1d(y, warn=True)
  File "/Users/cyz/Library/r-miniconda/envs/scenicplus/lib/python3.8/site-packages/sklearn/utils/validation.py", line 1244, in column_or_1d
    raise ValueError(
ValueError: y should be a 1d array, got an array of shape (940, 2) instead.

Process finished with exit code 1

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information):

Python: [e.g. Python 3.7.4]
SCENIC+: [e.g. 0.1.dev40+g4ff5053.d20210520]
If a bug is related to another module [e.g. matplotlib 3.3.9]

Additional context Add any other context about the problem here.

CYorick commented 1 year ago

That's weird, I can run get_search_space and merge_cistromes, but got stuck in calculate_regions_to_genes_relationships. Thanks for your help.

Best, Yorick

CYorick commented 1 year ago

Here is the description of my SCENIC+ object

SCENIC+ object with n_cells x n_genes = 940 x 23353 and n_cells x n_regions = 940 x 66831 metadata_regions:'Chromosome', 'Start', 'End', 'Width', 'cisTopic_nr_frag', 'cisTopic_log_nr_frag', 'cisTopic_nr_acc', 'cisTopic_log_nr_acc' metadata_genes:'gene_ids', 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts' metadata_cell:'GEX_n_genes', 'GEX_n_genes_by_counts', 'GEX_total_counts', 'GEX_total_counts_mt', 'GEX_pct_counts_mt', 'GEX_cell_type', 'GEX_sample_id', 'ACC_cisTopic_nr_acc', 'ACC_cisTopic_log_nr_frag', 'ACC_cisTopic_log_nr_acc', 'ACC_cisTopic_nr_frag', 'ACC_n_genes', 'ACC_n_genes_by_counts', 'ACC_total_counts', 'ACC_total_counts_mt', 'ACC_pct_counts_mt', 'ACC_cell_type', 'ACC_sample_id' menr:'CTX_topics_otsu_All', 'CTX_topics_otsu_No_promoters', 'DEM_topics_otsu_All', 'DEM_topics_otsu_No_promoters', 'CTX_topics_top_3_All', 'CTX_topics_top_3_No_promoters', 'DEM_topics_top_3_All', 'DEM_topics_top_3_No_promoters' dr_cell:'GEX_X_pca', 'GEX_X_umap'

CYorick commented 1 year ago

I finally find out that the EXP_df = scplus_obj.to_df(layer='EXP') with this gene has two columns, I am not sure why it happens, maybe we should simply choose the first column?

SeppeDeWinter commented 1 year ago

Hi @CYorick

This is indeed the issue..

I'm also not sure how this happened, you should go back and check wether you did not made a mistake while preprocessing the scRNA-seq data.

All the best,

Seppe

aertslab / scenicplus

Bug while running calculate_regions_to_genes_relationships #254