Reify trace information

stuhlmueller commented 10 years ago

Great project!

It seems that you are currently storing some data about the current state (execution trace) in global variables samples, dets, getindexes, and condition. Have you considered using a more local trace object instead that groups this information and anything else that may be part of the state?

This would make the design more modular, and would help with use cases such as copying/serializing the state (e.g., to resume sampling later on), and running multiple different models concurrently within a single file.

LaurenceA commented 10 years ago

Thanks!

Interesting. Putting the global state in an object would be pretty straightforward. However, I'd worry that trying to assign different variables/functions to different State objects would:

1 Make a mess of the syntax. 2 Provide lots of opportunities for bugs (what if you try to connect two variables that are part of different models?)

Moreover, serialization always going to be really hard. The problem is that you don't just want the object graph, you want the variable definitions too (e.g. a = normal()), otherwise you can't actually access any variables. Maybe an optional argument to give objects a name (i.e. x = normal(; name=:x), and a deserialize macro that looks for named objects, and creates variables with those names.

However, it would be nice to get rid of samples, dets and getindexes. Their only purpose is to give a reference to every object for the Church.jl gc step. Is some way to extract all defined objects from the Julia gc? samplers is a bit tougher to remove though - you can't distribute samplers through the node graph easily, because there is no one-to-one mapping between a sample and a sampler. Moreover, you need fast access to every sampler - so an array of references seems like the obvious data structure.

So maybe samplers should be the global state. You could have as an additional argument to a probability distribution a model field, that, left blank, defaults to some global value? Then serialize on samplers would pull in the whole model and hence work as expected (though you would need the definitions). Furthermore, this would allow you to compare different samplers for the same model, without having to actually duplicate the variables.

On Tue, Jan 28, 2014 at 6:53 PM, Andreas Stuhlmüller < notifications@github.com> wrote:

Great project!

It seems that you are currently storing some data about the current state (execution trace) in global variables samples, dets, getindexes, and condition. Have you considered using a more local trace object instead that groups this information and anything else that may be part of the state?

This would make the design more modular, and would help with use cases such as copying/serializing the state (e.g., to resume sampling later on), and running multiple different models concurrently within a single file.

Reply to this email directly or view it on GitHubhttps://github.com/LaurenceA/Church.jl/issues/2 .

stuhlmueller commented 10 years ago

I'm not sure the syntax has to change for a simple version of the proposal. You could still have a global pointer to the current state (including samplers), except that this information is now grouped under a new Trace type. In contrast to the current setup, this pointer could be non-constant, so that it can be set to a user-provided trace container. This way, it is possible to continue using your library as it is now (with a single default state), but the user has more control over the state object and can copy it, switch it out, etc. if necessary (at some risk of introducing bugs, as you point out).

Then serialize on samplers would pull in the whole model and hence work as expected (though you would need the definitions). Furthermore, this would allow you to compare different samplers for the same model, without having to actually duplicate the variables.

I don't quite understand the idea behind this yet, but this sounds great!

(On a related note, it would be very useful if the code came with a reference that explains for each of the technical terms—sample, sampler, model, det, etc.—how it is used in the context of this project.)

LaurenceA commented 10 years ago

Where should the reference information go? Comments inline? In a REFERENCE file?

On Wed, Jan 29, 2014 at 5:28 AM, Andreas Stuhlmüller < notifications@github.com> wrote:

I'm not sure the syntax has to change for a simple version of the proposal. You could still have a global pointer to the current state (including samplers), except that this information is now grouped under a new Trace type. In contrast to the current setup, this pointer could be non-constant, so that it can be set to a user-provided trace container. This way, it is possible to continue using your library as it is now (with a single default state), but the user has more control over the state object and can copy it, switch it out, etc. if necessary (at some risk of introducing bugs, as you point out).

Then serialize on samplers would pull in the whole model and hence work as expected (though you would need the definitions). Furthermore, this would allow you to compare different samplers for the same model, without having to actually duplicate the variables.

I don't quite understand the idea behind this yet, but this sounds great!

(On a related note, it would be very useful if the code came with a reference that explains for each of the technical terms--sample, sampler, model, det, etc.--how it is used in the context of this project.)

Reply to this email directly or view it on GitHubhttps://github.com/LaurenceA/Church.jl/issues/2#issuecomment-33557747 .

stuhlmueller commented 10 years ago

Any of these would be fine. I'd probably put it inline if it is just a sentence, and in doc/reference.md if it is more.

LaurenceA / Church.jl

Reify trace information #2