Emad-COMBINE-lab / GRouNdGAN

A causal implicit generative model for simulating single-cell RNA-seq data guided by a gene regulatory network 🧬
https://emad-combine-lab.github.io/GRouNdGAN/
GNU Affero General Public License v3.0
20 stars 2 forks source link

Error check_values_indices_shape_match in create_grn #8

Open bazoogis opened 2 weeks ago

bazoogis commented 2 weeks ago

Hello,

I am trying to run the tutorial for GRouNdGAN, using the PBMC example, having kept the folder structure as is.

I have successfully run the preprocess command and the train, test and validation files are created.

However, during the next step - create_grn, I get the following error:

File "/home/vzogop/GRouNdGAN/src/main.py", line 41, in grn_creation.create_GRN(cfg_parser) File "/home/vzogop/GRouNdGAN/src/preprocessing/grn_creation.py", line 30, in create_GRN real_cells_df = pd.DataFrame(real_cells.X, columns=real_cells.var_names) File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/pandas/core/frame.py", line 867, in init mgr = ndarray_to_mgr( File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 336, in ndarray_to_mgr _check_values_indices_shape_match(values, index, columns) File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 420, in _check_values_indices_shape_match raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}") ValueError: Shape of passed values is (66579, 1), indices imply (66579, 1000)

As shown, I am using a python 3.9 environment created using miniconda to run GRouNdGAN.

Best Regards, Bill

YazdanZ commented 2 weeks ago

Hi,

can you provide us with the .cfg file that you used? Did you use the provided one without any modifications?

The problem seems to be the number of genes in the GEX matrix (1 instead of 1000).

lryup commented 1 week ago

I have the same mistake.

you need change code:

real_cells_df = pd.DataFrame.sparse.from_spmatrix(real_cells.X, index=real_cells.obs_names, columns=real_cells.var_names)

bazoogis commented 1 week ago

Hi,

can you provide us with the .cfg file that you used? Did you use the provided one without any modifications?

The problem seems to be the number of genes in the GEX matrix (1 instead of 1000).

causal_gan.txt

It's the same .cfg file as was provided, I only made slight folder modifications. Nevertheless, I attach it here.

I have the same mistake.

you need change code:

real_cells_df = pd.DataFrame.sparse.from_spmatrix(real_cells.X, index=real_cells.obs_names, columns=real_cells.var_names)

Thanks, this seems to have gotten over this issue, although I'm getting a different error now:

parsing input creating dask graph shutting down client and local cluster finished Traceback (most recent call last): File "/home/vzogop/GRouNdGAN/src/main.py", line 41, in grn_creation.create_GRN(cfg_parser) File "/home/vzogop/GRouNdGAN/src/preprocessing/grn_creation.py", line 35, in create_GRN real_grn = grnboost2(real_cells_df, tf_names=TFs, verbose=True, seed=1) File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/arboreto/algo.py", line 39, in grnboost2 return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS, File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/arboreto/algo.py", line 120, in diy graph = create_graph(expression_matrix, File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/arboreto/core.py", line 450, in create_graph all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA) File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/dask_expr/io/_delayed.py", line 101, in from_delayed raise TypeError("Must supply at least one delayed object") TypeError: Must supply at least one delayed object

Best Regards, Bill

lryup commented 1 week ago

I am not modifying any folders, and .cfg files. After the correction,

    TFs = list(set(TFs).intersection(gene_names))
    # 李荣远修正错误
    # preparing GRNBoost2's input
    real_cells_df = pd.DataFrame.sparse.from_spmatrix(real_cells.X, index=real_cells.obs_names, columns=real_cells.var_names)
    # real_cells_df = pd.DataFrame(real_cells.X, index=real_cells.obs_names,columns=real_cells.var_names)
    # we can optionally pass a list of TFs to GRNBoost2
    print(f"Using {len(TFs)} TFs for GRN inference.")
    real_grn = grnboost2(real_cells_df, tf_names=TFs, verbose=True, seed=1)
    real_grn.to_csv(cfg.get("GRN Preparation", "Inferred GRN"))

run the following command. $ python src/main.py --config configs/causal_gan.cfg --create_grn

The results are the same as the authors.

TFs 63 Targets 937 Genes 1000 Possible Edges 59031 Imposed Edges 14055 GRN density Edges 0.238095

bazoogis commented 1 week ago

I have already modified grn_creation.py as per your suggestion:

def create_GRN(cfg: ConfigParser) -> None: """ Infers a GRN using GRNBoost2 and uses it to construct a causal graph to impose onto GRouNdGAN.

Parameters
----------
cfg : ConfigParser
    Parser for config file containing GRN creation params.
"""
real_cells = sc.read_h5ad(cfg.get("Data", "train"))
real_cells_val = sc.read_h5ad(cfg.get("Data", "validation"))
real_cells_test = sc.read_h5ad(cfg.get("Data", "test"))

# find TFs that are in highly variable genes
gene_names = real_cells.var_names.tolist()
TFs = pd.read_csv(cfg.get("GRN Preparation", "TFs"), sep="\t")["Symbol"]
TFs = list(set(TFs).intersection(gene_names))

# preparing GRNBoost2's input
#real_cells_df = pd.DataFrame(real_cells.X, columns=real_cells.var_names)
real_cells_df = pd.DataFrame.sparse.from_spmatrix(real_cells.X, index=real_cells.obs_names, columns=real_cells.var_names)

# we can optionally pass a list of TFs to GRNBoost2
print(f"Using {len(TFs)} TFs for GRN inference.")
real_grn = grnboost2(real_cells_df, tf_names=TFs, verbose=True, seed=1)
real_grn.to_csv(cfg.get("GRN Preparation", "Inferred GRN"))

However, the same error persists:

finished Traceback (most recent call last): File "/home/vzogop/GRouNdGAN/src/main.py", line 41, in grn_creation.create_GRN(cfg_parser) File "/home/vzogop/GRouNdGAN/src/preprocessing/grn_creation.py", line 35, in create_GRN real_grn = grnboost2(real_cells_df, tf_names=TFs, verbose=True, seed=1) File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/arboreto/algo.py", line 39, in grnboost2 return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS, File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/arboreto/algo.py", line 120, in diy graph = create_graph(expression_matrix, File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/arboreto/core.py", line 450, in create_graph all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA) File "/home/vzogop/miniconda3/envs/myenv/lib/python3.9/site-packages/dask_expr/io/_delayed.py", line 101, in from_delayed raise TypeError("Must supply at least one delayed object") TypeError: Must supply at least one delayed object

Could the error be attributed to a mismatch n the installed packages versions?