gao-lab / Cell_BLAST

A BLAST-like toolkit for large-scale scRNA-seq data querying and annotation.
http://cblast.gao-lab.org
MIT License
82 stars 13 forks source link

reconcile_models() problems #12

Closed caiquanyou closed 3 years ago

caiquanyou commented 3 years ago

hi @Jeff1995 , I run the code data_obj2_hits = data_obj2_hits.reconcile_models().filter(by="pval", cutoff=0.05) and get error below: IndexError Traceback (most recent call last)

in ----> 1 data_obj2_hits = data_obj2_hits.reconcile_models().filter(by="pval", cutoff=0.05) /usr/local/lib/python3.6/dist-packages/Cell_BLAST/blast.py in reconcile_models(self, dist_method, pval_method) 996 """ 997 dist_method = self._get_reconcile_method(dist_method) --> 998 dist = [dist_method(item, axis=1) for item in self.dist] 999 pval_method = self._get_reconcile_method(pval_method) 1000 pval = [pval_method(item, axis=1) for item in self.pval] /usr/local/lib/python3.6/dist-packages/Cell_BLAST/blast.py in (.0) 996 """ 997 dist_method = self._get_reconcile_method(dist_method) --> 998 dist = [dist_method(item, axis=1) for item in self.dist] 999 pval_method = self._get_reconcile_method(pval_method) 1000 pval = [pval_method(item, axis=1) for item in self.pval] <__array_function__ internals> in mean(*args, **kwargs) /usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in mean(a, axis, dtype, out, keepdims) 3255 3256 return _methods._mean(a, axis=axis, dtype=dtype, -> 3257 out=out, **kwargs) 3258 3259 /usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims) 136 137 is_float16_result = False --> 138 rcount = _count_reduce_items(arr, axis) 139 # Make this warning show up first 140 if rcount == 0: /usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py in _count_reduce_items(arr, axis) 55 items = 1 56 for ax in axis: ---> 57 items *= arr.shape[ax] 58 return items 59 IndexError: tuple index out of range no idea how to fix
Jeff1995 commented 3 years ago

Thanks for the report!

It's also not immediately clear to me why this happens. Could you please run the following lines before .reconcile_models() to save these objects as pickle files and post it here? That may help track down the problem. Thanks!

import pickle
with open("debug_hits_dist.pkl", "wb") as f:
    pickle.dump(data_obj2_hits.dist, f)
with open("debug_hits_pval.pkl", "wb") as f:
    pickle.dump(data_obj2_hits.pval, f)
caiquanyou commented 3 years ago

@Jeff1995 Ok here are the two pkl files in the debug.zip debug.zip

Jeff1995 commented 3 years ago

Seems that I can not reproduce the error under numpy 1.14.6. I suspect it's a numpy version issue. What numpy version are you using?

caiquanyou commented 3 years ago

I use numpy 1.17.2

Jeff1995 commented 3 years ago

I think I figured it out. It was not a numpy version problem, but rather because only one DIRECTi model was used in BLAST. In that case the singleton "model" dimension (axis=1) in the hist.dist array was missing, so taking the mean over axis=1 referred to a non-existent axis.

If only one model was used, .reconcile_models() is not necessary. You can just remove .reconcile_models() and continue with downstream steps.

Meanwhile, .reconcile_models() should also work even if only one model was used (just does nothing). It will be fixed in a future release.

caiquanyou commented 3 years ago

But I used 4 models before: code below:

models = [] for i in range(4): models.append(cb.directi.fit_DIRECTi( data_obj, genes=selected_genes, latent_dim=10, cat_dim=20, random_seed=i )) blast = cb.blast.BLAST(models, data_obj) data_obj2_hits = blast.query(data_obj2) data_obj2_hits = data_obj2_hits.reconcile_models().filter(by="pval", cutoff=0.05) error here

Jeff1995 commented 3 years ago

Well, that would be strange... Can you confirm that the data_obj2_hits.reconcile_models() line was not executed more than once? If that is the case, could you provide the data_obj object (as an h5 file), and the selected_genes object (as a text file), so I can try to reproduce the error.

caiquanyou commented 3 years ago

I do not use selected_genes, axes = data_obj.find_variable_genes() to produce the gene list; I use the HV gene finded before ,does it cause this problem? How could I save the data_obj , it is created by data_obj = cb.data.ExprDataSet(exprs=adata.X, obs=adata.obs, var=adata.var, uns=adata.uns)

Jeff1995 commented 3 years ago

I think the gene list shouldn't be the cause. You can save the data_obj with data_obj.write_dataset("filename.h5").

caiquanyou commented 3 years ago

Ok,the file contain data1.h5 and gene.csv  

------------------ 原始邮件 ------------------ 发件人: "gao-lab/Cell_BLAST" <notifications@github.com>; 发送时间: 2021年1月6日(星期三) 下午3:35 收件人: "gao-lab/Cell_BLAST"<Cell_BLAST@noreply.github.com>; 抄送: "xianmao"<951463554@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [gao-lab/Cell_BLAST] reconcile_models() problems (#12)

I think the gene list shouldn't be the cause. You can save the data_obj with data_obj.write_dataset("filename.h5").

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

从QQ邮箱发来的超大附件

data.zip (221.35M, 2021年02月05日 15:44 到期)进入下载页面:http://mail.qq.com/cgi-bin/ftnExs_download?t=exs_ftn_download&k=0c6331328a524fc4b34d870a16385719095b0650530c060f1c5201070015510709531c0a025b501b57065754530f060705530800303065525017501c4a5115360c&code=1c1208e6

Jeff1995 commented 3 years ago

I tried on this data (using the training data data_obj as query since I do not have data_obj2), but I could not reproduce the error using the following script:

import pandas as pd
import Cell_BLAST as cb

data_obj = cb.data.ExprDataSet.read_dataset("data1.h5")
selected_genes = pd.read_csv("gene.csv", index_col=0).to_numpy().ravel().tolist()

models = []
for i in range(4):
    models.append(cb.directi.fit_DIRECTi(
    data_obj, genes=selected_genes,
    latent_dim=10, cat_dim=20, random_seed=i
))

blast = cb.blast.BLAST(models, data_obj)
data_obj_hits = blast.query(data_obj)
data_obj_hits = data_obj_hits.reconcile_models().filter(by="pval", cutoff=0.05)

print("Done!")

Could you please try running this as a Python script (not as a Jupyter notebook) and see if it works on your side?

If the error persists, it would most likely be an environment issue. You may need to provide your detailed environment specification (via conda env export) so I can try to reproduce it.

caiquanyou commented 3 years ago

@Jeff1995 It work as the script,but still fail in Jupyter notebook

Jeff1995 commented 3 years ago

Okay. I think the most likely cause is that you ran the following line more than once in the Jupyter notebook:

data_obj_hits = data_obj_hits.reconcile_models().filter(by="pval", cutoff=0.05)

It should be run only once. If you run it a second time it will produce the error.

caiquanyou commented 3 years ago

Thanks!