WhirlFirst / somde

Algorithm for finding gene spatial pattern based on Gaussian process accelerated by SOM
MIT License
6 stars 5 forks source link

Invalid p-values #3

Open carissaynchen opened 1 year ago

carissaynchen commented 1 year ago

Hello,

I run into this issue with my dataset during normalisation using som.norm()

/home/ubuntu/anaconda3/envs/somde/lib/python3.8/site-packages/pandas/core/internals/blocks.py:402: RuntimeWarning: invalid value encountered in log result = func(self.values, **kwargs)

and later there is an error related to invalid p-values Traceback (most recent call last): File "Simulations_somde.py", line 57, in <module> result, SVnum = som.run() File "/home/ubuntu/.local/lib/python3.8/site-packages/somde/som.py", line 115, in run result = self.Sparun(X, self.nres) File "/home/ubuntu/.local/lib/python3.8/site-packages/somde/som.py", line 106, in Sparun mll_results['qval'] = qvalue(mll_results['pval']) File "/home/ubuntu/.local/lib/python3.8/site-packages/somde/util.py", line 107, in qvalue assert(pv.min() >= 0 and pv.max() <= 1), "p-values should be between 0 and 1" AssertionError: p-values should be between 0 and 1

is there an explanation as to what could be the cause of this error?

Thank you!

WhirlFirst commented 1 year ago

Hi, could you please check the values in your dataset if there are some invalid scalars like non-positive values? For the details of the norm function, you can refer to https://github.com/WhirlFirst/somde/issues/2

guoguohastea commented 5 months ago

AttributeError with scipy when running som.run()

Hello, I've encountered an AttributeError when attempting to use the run method from the somde library. The error suggests that scipy is missing the 'arange' attribute, although my code does not directly import or use scipy. Here is the error message: result, SVnum = som.run()

/home/data/t010519/miniconda3/envs/spa/lib/python3.12/site-packages/somde/util.py:511: FutureWarning: The provided callable is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead. model_results = model_results[model_results.groupby(['g'])['max_ll'].transform(max) == model_results['max_ll']]

KeyError Traceback (most recent call last) File /home/data/t010519/miniconda3/envs/spa/lib/python3.12/site-packages/scipy/init.py:137, in getattr(name) 136 try: --> 137 return globals()[name] 138 except KeyError:

KeyError: 'arange'

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last) Cell In[33], line 2 1 # 运行空间变异性分析 ----> 2 result, SVnum = som.run()

File /home/data/t010519/miniconda3/envs/spa/lib/python3.12/site-packages/somde/som.py:115, in SomNode.run(self) 113 self.norm() 114 X=self.ninfo[['x','y']].values.astype(float) --> 115 result = self.Sparun(X, self.nres) 116 result.sort_values('LLR',inplace=True,ascending=False) 117 number_q = result[result.qval<0.05].shape[0]

File /home/data/t010519/miniconda3/envs/spa/lib/python3.12/site-packages/somde/som.py:106, in SomNode.Sparun(self, X, exp_tab, kernel_space) 104 # Perform significance test 105 mll_results['pval'] = 1 - stats.chi2.cdf(mll_results['LLR'], df=1) --> 106 mll_results['qval'] = qvalue(mll_results['pval']) 108 return mll_results

File /home/data/t010519/miniconda3/envs/spa/lib/python3.12/site-packages/somde/util.py:122, in qvalue(pv, pi0) 119 else: 120 # evaluate pi0 for different lambdas 121 pi0 = [] --> 122 lam = sp.arange(0, 0.90, 0.01) 123 counts = sp.array([(pv > i).sum() for i in sp.arange(0, 0.9, 0.01)]) 124 for l in range(len(lam)):

File /home/data/t010519/miniconda3/envs/spa/lib/python3.12/site-packages/scipy/init.py:139, in getattr(name) 137 return globals()[name] 138 except KeyError: --> 139 raise AttributeError( 140 f"Module 'scipy' has no attribute '{name}'" 141 )

AttributeError: Module 'scipy' has no attribute 'arange'

WhirlFirst commented 5 months ago

Hi, could you check your scipy and numpy package? I use scipy==1.11.3 and numpy==1.24.4, and it works.

rocketeer1998 commented 4 months ago

I'm also facing this issue and i've checked that there are no invalid scalars like non-positive values in my matrix. I walk around this by setting SomNode(X,1). But the number of genes in the final result dataframe is less than original counterpart. So how can I fix it in a more elegant way?

WhirlFirst commented 4 months ago

I'm also facing this issue and i've checked that there are no invalid scalars like non-positive values in my matrix. I walk around this by setting SomNode(X,1). But the number of genes in the final result dataframe is less than original counterpart. So how can I fix it in a more elegant way?

Hi, Can you try to adjust the neighbor number k here? A small k may make the statistical modeling fail and return null p values. I also suggest you remove genes with a low number of expressed cells when inferring SVG.

rocketeer1998 commented 4 months ago

Thanks for your quick reply! I've filtered the low-expressed genes and cells already using scanpy's functions. So do you have any recommendations on how to finely tune the K value for spatial datasets with > 10000 cells?

WhirlFirst commented 4 months ago

In the tutorial, slide seq data have about 10,000 cells and we set k as 20, so I recommend you try a higher k number and also use a more strict threshold for low expressed genes.

rocketeer1998 commented 4 months ago

I will test it and come back to you later this day.

rocketeer1998 commented 4 months ago

I set k = 1 and now SOMDE works well. My troublesome data has 3065 cells and 119 genes. Thank you!