dfm / george

Fast and flexible Gaussian Process regression in Python
http://george.readthedocs.io
MIT License
451 stars 128 forks source link

Batching GPs #110

Closed mileslucas closed 5 years ago

mileslucas commented 5 years ago

I'm interested in using george to tackle my problem with n independent gaussian processes used in an effective and clean way. For instance, if I want to batch 6 independent GPs, I would want my output to look like (nbatch, nsamples) rather than (nsamples,).

I can sort of achieve batching using ModelSet; e.g.

class BatchKernel(ModelSet):
   def get_value(self, params):
       return np.stack([mod.get_value(params) for mod in self.models.values()], 0)

>>> kern = BatchKernel([('batch{}'.format(i), 10 * kernels.ExpSquaredKernel([1e4, 1, 1], ndim=3)) for i in range(6)])
>>> kern.get_value(params).shape
(6, 264)

unfortunately, this effectively removes any other functionality from the kernels (besides get_value)

Any ideas on how to best approach this scenario? I could kind of bootstrap the problem using for loops, but I find that clunky and cumbersome to have to code (every single location I use the GPs and kernels needs a for loop call.

dfm commented 5 years ago

Unfortunately this isn't really something george supports and it would probably be a big rewrite to solve it in general. Your plan of making a batched kernel isn't really going to work either because all of the GP math is special cased for a scalar kernel. When I've had structure like this in the past, I've just manually instantiated multiple GPs and the bookkeeping isn't generally too bad. (you could use a ModelSet to describe the set of GPs for example.)

On Sat, Feb 9, 2019 at 12:17 AM Miles Lucas notifications@github.com wrote:

I'm interested in using george to tackle my problem with n independent gaussian processes used in an effective and clean way. For instance, if I want to batch 6 independent GPs, I would want my output to look like (nbatch, nsamples) rather than (nsamples,).

I can sort of achieve batching using ModelSet; e.g.

class BatchKernel(ModelSet): def get_value(self, params): return np.stack([mod.get_value(params) for mod in self.models.values()], 0)

kern = BatchKernel([('batch{}'.format(i), 10 * kernels.ExpSquaredKernel([1e4, 1, 1], ndim=3)) for i in range(6)])>>> kern.get_value(params).shape (6, 264)

unfortunately, this effectively removes any other functionality from the kernels (besides get_value)

Any ideas on how to best approach this scenario? I could kind of bootstrap the problem using for loops, but I find that clunky and cumbersome to have to code (every single location I use the GPs and kernels needs a for loop call.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dfm_george_issues_110&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=AvKCBU-zyph1qdf6NaatMQ&m=D_Cn8aDLccMvYZau7P4og5qqJGCAM2873NpoerUHK44&s=X_hfHB2SoRT82hGIMFq2QrTY-LRcyeUD6p0EWKVW58Q&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAVYSp6e-5FZPq7-2DdtMOvt2HKjUXmabcvJks5vLlnVgaJpZM4ayY0s&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=AvKCBU-zyph1qdf6NaatMQ&m=D_Cn8aDLccMvYZau7P4og5qqJGCAM2873NpoerUHK44&s=08phcBKZcGOZpNKM4_CiYQApl6xqrUJCqJMk9hQQEz4&e= .

-- Dan Foreman-Mackey Associate Research Scientist Flatiron Institute http://dfm.io

mileslucas commented 5 years ago

Dan,

Thanks for the response. It seems like I will end up writing out the GP math for Starfish within the Starfish package itself. It's a shame having to rewrite math that has been implemented plenty of times before, but this seems like the best choice for us, now.