bioFAM / MEFISTO_tutorials

Repository containing tutorials for MEFISTO
1 stars 0 forks source link

MEFISTO with Poisson likelihood #2

Open willtownes opened 3 years ago

willtownes commented 3 years ago

Hi, congratulations on this awesome method! I am interested in trying MEFISTO with the Poisson likelihood. I have been following the tutorial for the Visium brain data, but it seems to run into numerical problems after the first few iterations. Here is the code I have been using:

ent = entry_point()
ent.set_data_options(use_float32=True)
ad.raw = ad
ent.set_data_from_anndata(ad, use_raw=True, likelihoods="poisson")
ent.set_model_options(factors=4)
ent.set_train_options(iter=ne)
ent.set_covariates([ad.obsm["spatial"]], covariates_names=["imagerow", "imagecol"])
ent.set_smooth_options(sparseGP=True, frac_inducing=M/ad.n_obs,
                       start_opt=10, opt_freq=10)
ent.build()
%time ent.run()

At iteration 12 the ELBO becomes nan and after iteration 19 it says "Optimising sigma node..." then raises an exception: UnboundLocalError: local variable 'best_lidx' referenced before assignment

bv2 commented 3 years ago

Hi @willtownes,

thanks for your interest in the method.

In general, we recommend in most cases to use the Gaussian likelihood in combination with a suitable preprocessing (see also some guidelines/recommendations here) to take data characteristics into account while providing a good tradeoff in terms of scalability and performance. We added a small-scale simulation example for a simple illustration of the Poisson likelihood here.

The numerical problems that result in the nan-values seem to be an issue in the underlying MOFA model on this data set. We will take a look at this and let you know once it is fixed. Thanks for reporting the bug!

bv2 commented 3 years ago

Hi @willtownes,

just as quick update: We fixed the numerical issues which you encountered on the Poisson likelihood. If you install mofapy2 from the dev branch (pip install git+https://github.com/bioFAM/mofapy2@dev) the error above should be fixed. We will merge this in the coming versions with the master branch and PyPI. However, as mentioned above, Gaussian likelihood + a suitable pre-processing might still be a better choice for the spatial transcriptomics data.

willtownes commented 3 years ago

OK I have tested this and while it no longer has the numerical divergence error early in training, it seems to have some weird behavior and never converged. Below is a plot of the ELBO with the horizontal axis representing the number of epochs. I'm not sure why the ELBO periodically drops precipitously. image

bv2 commented 3 years ago

Hi Will, this looks strange, we will have a look. It seems to be specific to the combination of sparse GPs with a Poisson likelihood. For now, we'd recommend to use either a Gaussian likelihood or a Poisson likelihood in conjunction with a full GP model (setting sparseGP = False). I will update here once we have fixed the problem above.