dfwlab / NAFLD_keystone

Keystone species of NAFLD
5 stars 4 forks source link

[run_causalinference.py] Running time is too long #2

Open gaminn6 opened 11 months ago

gaminn6 commented 11 months ago

I could completely run the results with your example "demo.ipynb". But after I used my own data (24x26 columns) and ran “run_causalinference.py” for 4 hours, the following description appeared and generated a blank file "AGP_control_causal_log.txt".

Hope you can give a little guidance. Thank you very much!

Debug:

linuxljm@linuxljm-virtual-machine:~$ python3 /home/linuxljm/yonatanf-sparcc-3aff6141c3f1/run_causalinference.py

/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py:1794: RuntimeWarning: divide by zero encountered in divide
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py:1794: RuntimeWarning: invalid value encountered in scalar multiply
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py:1716: RuntimeWarning: divide by zero encountered in scalar divide
  return np.dot(wresid, wresid) / self.df_resid
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/base/model.py:1529: RuntimeWarning: invalid value encountered in multiply
  cov_p = self.normalized_cov_params * scale
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py:1794: RuntimeWarning: divide by zero encountered in divide
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py:1794: RuntimeWarning: invalid value encountered in scalar multiply
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py:1716: RuntimeWarning: divide by zero encountered in scalar divide
  return np.dot(wresid, wresid) / self.df_resid
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/base/model.py:1529: RuntimeWarning: invalid value encountered in multiply
  cov_p = self.normalized_cov_params * scale
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py:1794: RuntimeWarning: divide by zero encountered in divide
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py:1794: RuntimeWarning: invalid value encountered in scalar multiply
  return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py:1716: RuntimeWarning: divide by zero encountered in scalar divide
  return np.dot(wresid, wresid) / self.df_resid
/home/linuxljm/.local/lib/python3.8/site-packages/statsmodels/base/model.py:1529: RuntimeWarning: invalid value encountered in multiply
  cov_p = self.normalized_cov_params * scale
dfwlab commented 11 months ago

Here are a few suggestions for troubleshooting:

  1. Please check if your SPARCC results (sparcc_f) are normal. Specifically, verify if there is a correlation network of taxa at p < P_THRESHOLD.

  2. Examine your data for any anomalies or taxa with an abundance of zero.

  3. In the run_causalinference.py script, modify the parameters such as DATA_PATH, SPARCC_PATH, OUT_PATH, OUT_TEMP_PATH, in_f, and sparcc_f to match your file paths.

  4. You can perform a single calculation using the causal_inference_from_prior(data, pattern, times) function in run_causalinference.py, without iterative optimization. Check the output of e_matrix and p_matrix. Code :

data = load_abundance(DATA_PATH + in_f) prior = load_prior_network(SPARCC_PATH + sparcc_f, threshold=P_THRESHOLD) e_matrix, p_matrix = causal_inference_from_prior(data, pattern=prior, times=BOOTSTRAP_TIMES)