JuliaML / Reinforce.jl

Abstractions, algorithms, and utilities for reinforcement learning in Julia
Other
201 stars 35 forks source link

Policy initialization #1

Open jhlq opened 8 years ago

jhlq commented 8 years ago

Require a way to conveniently manually provide initial knowledge for a policy.

For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

tbreloff commented 8 years ago

I would say that I haven't settled on a policy api yet... I've been a little more focused on the environments. If you have time, could you write out a little example code of how you see initializing policies? Looking forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com wrote:

Require a way to conveniently manually provide initial knowledge for a policy.

For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru .

jhlq commented 8 years ago

The getters are straight forward, just query the policy as usual. The setters are a form of supervised learning so it would make sense to save every set value as a training example, then we can have a basic implementation and if a user builds up a large library of samples they can easily plug their favorite library for supervised learning into the setter system. On 31 Jul 2016 13:45, "Tom Breloff" notifications@github.com wrote:

I would say that I haven't settled on a policy api yet... I've been a little more focused on the environments. If you have time, could you write out a little example code of how you see initializing policies? Looking forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com wrote:

Require a way to conveniently manually provide initial knowledge for a policy.

For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1, or mute the thread < https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236425611, or mute the thread https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru .

tbreloff commented 8 years ago

I think that, without sample code, I'll have a hard time understanding what a "getter/setter" is. Do you mean a lookup table for states and actions? If so, my interest lies much more in RL through function approximation, so I don't have much need for table lookup apis (though it could certainly be supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com wrote:

The getters are straight forward, just query the policy as usual. The setters are a form of supervised learning so it would make sense to save every set value as a training example, then we can have a basic implementation and if a user builds up a large library of samples they can easily plug their favorite library for supervised learning into the setter system. On 31 Jul 2016 13:45, "Tom Breloff" <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I would say that I haven't settled on a policy api yet... I've been a little more focused on the environments. If you have time, could you write out a little example code of how you see initializing policies? Looking forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Require a way to conveniently manually provide initial knowledge for a policy.

For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236425611>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236441150, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru .

jhlq commented 8 years ago

Let's say our child is practicing math and we have prepared a challenging problem. The getter would be asking what they think the answer is and the setter is telling them the answer. On 31 Jul 2016 19:28, "Tom Breloff" notifications@github.com wrote:

I think that, without sample code, I'll have a hard time understanding what a "getter/setter" is. Do you mean a lookup table for states and actions? If so, my interest lies much more in RL through function approximation, so I don't have much need for table lookup apis (though it could certainly be supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com wrote:

The getters are straight forward, just query the policy as usual. The setters are a form of supervised learning so it would make sense to save every set value as a training example, then we can have a basic implementation and if a user builds up a large library of samples they can easily plug their favorite library for supervised learning into the setter system. On 31 Jul 2016 13:45, "Tom Breloff" <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I would say that I haven't settled on a policy api yet... I've been a little more focused on the environments. If you have time, could you write out a little example code of how you see initializing policies? Looking forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Require a way to conveniently manually provide initial knowledge for a policy.

For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236425611 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236441150>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236443872, or mute the thread https://github.com/notifications/unsubscribe-auth/AGCB2sa3CFGebaevB9yw8rjXcHqIQ2VPks5qbNtKgaJpZM4JY9Ru .

tbreloff commented 8 years ago

So that's not really reinforcement learning. You should check out our effort in JuliaML if you're more interested in more general machine learning. In RL there are no "answers", only rewards.

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com wrote:

Let's say our child is practicing math and we have prepared a challenging problem. The getter would be asking what they think the answer is and the setter is telling them the answer. On 31 Jul 2016 19:28, "Tom Breloff" <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I think that, without sample code, I'll have a hard time understanding what a "getter/setter" is. Do you mean a lookup table for states and actions? If so, my interest lies much more in RL through function approximation, so I don't have much need for table lookup apis (though it could certainly be supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

The getters are straight forward, just query the policy as usual. The setters are a form of supervised learning so it would make sense to save every set value as a training example, then we can have a basic implementation and if a user builds up a large library of samples they can easily plug their favorite library for supervised learning into the setter system. On 31 Jul 2016 13:45, "Tom Breloff" <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I would say that I haven't settled on a policy api yet... I've been a little more focused on the environments. If you have time, could you write out a little example code of how you see initializing policies? Looking forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

Require a way to conveniently manually provide initial knowledge for a policy.

For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236425611 ,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236441150 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236443872>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AGCB2sa3CFGebaevB9yw8rjXcHqIQ2VPks5qbNtKgaJpZM4JY9Ru

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236444186, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492m8FFEIwojCH6oSr64DzlwmFgLDEks5qbNxggaJpZM4JY9Ru .

jhlq commented 8 years ago

Yes, as mentioned this is supervised and people would be able to plug their favorite ML library.

Connecting the two is the goal, schools don't let students work entirely on their own and neither do teachers lead them through every single problem. A mix allows the AI to explore on its own with intermittent interventions from more knowledgeable intelligences.

Reinforcement learning is key for robust AI and like mixing a metal with trace elements can create strong alloys so will adding specks of supervision significantly hasten progress. On 31 Jul 2016 19:44, "Tom Breloff" notifications@github.com wrote:

So that's not really reinforcement learning. You should check out our effort in JuliaML if you're more interested in more general machine learning. In RL there are no "answers", only rewards.

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com wrote:

Let's say our child is practicing math and we have prepared a challenging problem. The getter would be asking what they think the answer is and the setter is telling them the answer. On 31 Jul 2016 19:28, "Tom Breloff" <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I think that, without sample code, I'll have a hard time understanding what a "getter/setter" is. Do you mean a lookup table for states and actions? If so, my interest lies much more in RL through function approximation, so I don't have much need for table lookup apis (though it could certainly be supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

The getters are straight forward, just query the policy as usual. The setters are a form of supervised learning so it would make sense to save every set value as a training example, then we can have a basic implementation and if a user builds up a large library of samples they can easily plug their favorite library for supervised learning into the setter system. On 31 Jul 2016 13:45, "Tom Breloff" <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I would say that I haven't settled on a policy api yet... I've been a little more focused on the environments. If you have time, could you write out a little example code of how you see initializing policies? Looking forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

Require a way to conveniently manually provide initial knowledge for a policy.

For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236425611 ,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236441150 ,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236443872 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AGCB2sa3CFGebaevB9yw8rjXcHqIQ2VPks5qbNtKgaJpZM4JY9Ru

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236444186>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AA492m8FFEIwojCH6oSr64DzlwmFgLDEks5qbNxggaJpZM4JY9Ru

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236444992, or mute the thread https://github.com/notifications/unsubscribe-auth/AGCB2sKlC8AbAx9Ofsi0n1By2cVSsgw0ks5qbN7-gaJpZM4JY9Ru .