Currently the code has been minimally optimized. I was under the impression that most of the runtime takes place in theano functions, but so far theano's profiler seems to think it's spent only 1/100th of the time that the run actually takes. There are likely some very low hanging fruit to improve speed up but I'll need to sort out the theano profiler.
The bottleneck will ultimately be in computing logp of Z once D is sizable.
For N=100, nObs = 1609, M=K=4, D=16, using full NUTS sampling each sample step takes about 10 seconds. These are just toy samples though since D is much to small at the moment (should be ~700).
Currently the code has been minimally optimized. I was under the impression that most of the runtime takes place in theano functions, but so far theano's profiler seems to think it's spent only 1/100th of the time that the run actually takes. There are likely some very low hanging fruit to improve speed up but I'll need to sort out the theano profiler.
The bottleneck will ultimately be in computing logp of Z once D is sizable.