bnpy / bnpy

Bayesian nonparametric machine learning for Python
Other
208 stars 57 forks source link

Missing MOVBBirthMergeAlg #14

Closed simozacca closed 5 years ago

simozacca commented 5 years ago

Dear author,

In our research group (prof. Ben Raphael at Princeton University) we are developing some methods that would require the usage of the very useful BNPY (thank you! This is indeed a great tool). In the versions of our programs (one is THetA, available at https://github.com/raphael-group/THetA), we are using the last updated version of BNPY available on bitbucket at https://bitbucket.org/michaelchughes/bnpy-dev and we call BNPY with a command similar to the following (I have been told that this command has been designed in direct collaboration with you in the past):

hmodel, Info = bnpy.Run.run(Data, 'DPMixtureModel', 'DiagGauss', 'moVB', nLap=100, nTask=1, K=K, moves='birth,merge', targetMaxSize=500, ECovMat='eye', mergeStartLap=10, sF=sf, doWriteStdOut=False)

This has always worked great with the version of BNPY we were used to consider. Unfortunately, it does not work anymore when we consider the new version of BNPY in this repository, ending with the following error:

`File "/nfs/sw/bnpy/bnpy-1.0/bnpy/Run.py", line 433, in createLearnAlg`
          `LearnAlgConstr = bnpy.learnalg.MOVBBirthMergeAlg`
`AttributeError: 'module' object has no attribute 'MOVBBirthMergeAlg'`

Shall we keep the dependencies of our methods to the previous version of BNPY or is there a way to fix this also in this new version?

Thank you so much, Best regards,

michaelchughes commented 5 years ago

Thanks for your interest! There should be a relatively easy fix. The file was renamed to "MemoVBMovesAlg": https://github.com/bnpy/bnpy/blob/master/bnpy/learnalg/MemoVBMovesAlg.py

Your equivalent command would be:

hmodel, Info = bnpy.Run.run(Data, 'DPMixtureModel', 'DiagGauss', 'memoVB', nLap=100, nTask=1, K=K, moves='birth,merge', ECovMat='eye', m_StartLap=10, sF=sf, doWriteStdOut=False)

Note: no more "--targetMaxSize"

Let me know if that works with the latest github.com/bnpy/bnpy code.... if you have more trouble I'm happy to help.

simozacca commented 5 years ago

Thank you! As we would like to be compatible with both the versions of BNPY do you have a suggestion on how we could implement the two possibilities according to whether someone is using one version or the other?

Is the best solution simply to use the python hasattr to check whether MOVBMovesAlg or MemoVBMovesAlg is present in the corresponding BNPY module?

Also, I have two questions about the command that we are using:

  1. There is parameter nTasks, is this suppose to limit the number of used threads? Unfortunately, I always notice the creation of many subprocesses even if a fix this parameter equal to 1.

  2. By looking at the command, do you have any suggestion? In fact, there are some parameters that I do not understand and I am not sure whether we should change these, e.g. StartLap, nLap, ECovMat, ..

Thank you, Best

michaelchughes commented 5 years ago

I think a quick hasattr test would be just fine to switch between old/new versions. Sorry for the headache (some reviewers did not like the proposed "MOVB" name, so we've gone with the more descriptive "Memoized VI" name)

  1. nTasks is not related to threads. It is the number of distinct separate initializations (with distinct rand seeds) that are executed serially. When you suspect many local optima, setting nTasks=10 will cause bnpy to run 10 separate inits and choose the best one (in terms of the training objective).

  2. nLap sets how many full passes through the dataset the algo completes before termination ("laps", sometimes also called "epochs"). You can safely ignore startLap (it's useful for restarting runs, probably not your concern). ECovMat sets one of the key prior hyperparameters of the Gaussian likelihood. See discussion here: https://bnpy.readthedocs.io/en/latest/examples/01_asterisk_K8/plot-01-demo=init_methods-model=mix+gauss.html

In general, all the important hyperparameters are covered in the various examples on https://bnpy.readthedocs.io

I'm sure we're missing some, so feel free to ask here if you can't find info in any of the examples.

simozacca commented 5 years ago

Everything is pretty clear! I will re-open the thread in case of future issues.

Thank you!