biocore / mmvec

Neural networks for microbe-metabolite interaction analysis
BSD 3-Clause "New" or "Revised" License
118 stars 51 forks source link

Ensuring q2-plug-in ran correctly #139

Closed BrandonDKayser closed 4 years ago

BrandonDKayser commented 4 years ago

I ran the q2-plug-in of mmvec and it produced the results in less than a minute. The data consists of ~300 ASVs from stool and ~5000 LCMS features measured in plasma in 30 samples. This seems way too fast given the time estimates provided in the ReadMe. There are no warnings or errors, and mmvec successfully generates the table of conditional ranks. I do notice that the highest ranked microbe-metabolite pairs are not very convincing when plotted on a scatter plot despite log conditional probabilities on the order of 6-8.

Should the very short run-time be a red flag?

Here is my code: qiime mmvec paired-omics \ --i-microbes stool_16s_fasted_short.qza \ --i-metabolites plasma_bal_fasted_short.qza \ --p-learning-rate 1e-5 \ --p-input-prior 0.5 \ --p-output-prior 0.5 \ --p-num-testing-examples 6 \ --p-latent-dim 2 \ --o-conditionals plasma_bal_ranks.qza \ --o-conditional-biplot plasma_bal_biplot.qza

Thank you!

mortonjt commented 4 years ago

Hi @BrandonDKayser , thank you for your interest. Sure we can add in flags (we have an open issue here: https://github.com/biocore/mmvec/issues/129) - the only thing is that every dataset is different, so it's hard to set a hard threshold to determine if it failed or not.

Regarding your dataset, you'll definitely want to bump up the number of epochs. You may also want to have 1-3 testing samples since you only have 30 samples. Feel free to post your tensorboard summaries if you need more feedback.

BrandonDKayser commented 4 years ago

Thank you for the quick response. I've re-run it with 10000 epochs and cv-testing samples of 1, and it took about 2 hours. The curves on the Tensorboard look like they are converging (my code above actually didn't even produce curves on the Tensorboards...). Should I be playing with the batch parameter? Perhaps lower the priors even more?

Despite the plateau, the CV RMSE still seems quite high. But since this is plasma metabolites, I am not sure it would make sense for the gut microbiota to perfectly predict the whole metabolome profile. Screen Shot 2020-07-08 at 4 08 19 PM

mortonjt commented 4 years ago

Those results look good - cv_rmse is expected to be high since the intensities are high. You are right that the microbes are not expected to perfectly predict the blood metabolite abundnaces.

BrandonDKayser commented 4 years ago

Thanks Jamie! I'll close the issue. Looking forward to playing with the software on other data sets.