facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.1k forks source link

OpenAI Gym possible integration #95

Closed aleSuglia closed 7 years ago

aleSuglia commented 7 years ago

Hi to all,

Have you ever thought to use OpenAI Gym as a core framework for developing ParlAI? Why do you choose to develop it completely from scratch? May you be interested in a possible integration with this framework? I will be very interesting in trying to develop this kind of feature so maybe I can help with the development.

Thank you in advance for your answer.

Alessandro

jaseweston commented 7 years ago

We did think about it (and can certainly consider linking to it going forward).

Some of our reasons to make it separate are:

Overall, Gym doesn't really try to address dialog much. So we weren't sure what gains were to be had there w.r.t to the stuff we have done.

Having said all that, having dialog in e.g. a 3D gaming world is obviously a cool thing, and maybe integration with Gym going forward would be good for things like that?

On Sun, May 21, 2017 at 10:51 AM, Alessandro Suglia < notifications@github.com> wrote:

Hi to all,

Have you ever thought to use OpenAI Gym as a core framework for developing ParlAI? Why do you choose to develop it completely from scratch? May you be interested in a possible integration with this framework? I will be very interesting in trying to develop this kind of feature so maybe I can help with the development.

Thank you in advance for your answer.

Alessandro

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/95, or mute the thread https://github.com/notifications/unsubscribe-auth/AKjk-NkqeR0ZhEH751v62fBNFctGJSPTks5r8E-JgaJpZM4Nhnzo .

aleSuglia commented 7 years ago

Hi @jaseweston,

First of all, thank you for your really inspiring and complete answer. I'll try to express my opinion about some of the points that you have listed in the following statements.

  • We are not only focused on RL. Many datasets are supervised. And even then, reward is very rarely our main signal, I think.
  • We wanted to have a very clear consistent message passing object with fixed fields (the observation/action dict). I guess we could do that in Gym by dumping everything in observation, but it doesn't really have those fields?

This is definitely right. However, in OpenAI Gym a reward it is a float but you can specify in the info dictionary any data structure (please see here) so I think that you can easily customize it for other uses. For instance, a dataset which emits supervised learning signals defines implicitly that every agent should expect a supervised learning signal and not a reward, and vice versa. This is exactly what you've done in your framework in which you've got two different observation data structures for the two settings. I'm pretty confident that you can easily do that in OpenAI too.

  • Not sure about this one, but we definitely wanted to cover the case of training multiple agents, wasn't sure if Gym focused on that?

I am really interested in this feature of your framework too. I am firmly convinced that the emergence of intelligence can be assessed by evaluating how artificial agents are able to interact and coexist with other agents in the same environment. Unfortunately, I've not any experience in multi-agent training in OpenAI yet but I've recently seen that they've integrated the Robotschool in which you can do exactly that. Maybe can be interesting to understand to what extent the implemented environment can be used to support yours?

  • Also didn't see in Gym Hogwild and Batch support, but I guess can be added.

I think that Batch support is already covered by this issue. For Hogwild, I'm not able to express an opinion because I don't know the algorithm in detail.

Having said all that, having dialog in e.g. a 3D gaming world is obviously a cool thing, and maybe integration with Gym going forward would be good for things like that?

This is exactly my long-term goal. Design an agent able to complete Universe environments in which it should communicate with other agents in order to achieve its objectives. Moreover, in a 3D world, an agent should learn how to interact with the environment and what is the effects of its actions on it. This kind of interaction between language and direct manipulation in 3D world is something that really fascinates me.

Again, I want to thank you for your great answer and I'm looking forward to see a prospective plan to link the two platforms. In my opinion there are a lot of interesting advantages in doing that. Maybe people from OpenAI can be interested in this project too. I remember that @hans is interested in a similar project because we've briefly discussed about it on Twitter some time ago.

Best regards, Alessandro

jaseweston commented 7 years ago

On Wed, May 24, 2017 at 5:24 PM, Alessandro Suglia <notifications@github.com

wrote:

Hi @jaseweston https://github.com/jaseweston,

First of all, thank you for your really inspiring and complete answer. I'll try to express my opinion about some of the points that you have listed in the following statements.

  • We are not only focused on RL. Many datasets are supervised. And even then, reward is very rarely our main signal, I think.
  • We wanted to have a very clear consistent message passing object with fixed fields (the observation/action dict). I guess we could do that in Gym by dumping everything in observation, but it doesn't really have those fields?

This is definitely right. However, in OpenAI Gym a reward it is a float but you can specify in the info dictionary any data structure (please see here https://gym.openai.com/docs) so I think that you can easily customize it for other uses

Yes, I saw the info field but I thought this would be more of a hack to fit what we really wanted into their framework rather than it being a core part of the framework. I think if we do add some kind of thing that wraps ParlAI in Gym though we can do things like that.

. For instance, a dataset which emits supervised learning signals defines implicitly that every agent should expect a supervised learning signal and not a reward, and vice versa. This is exactly what you've done in your framework in which you've got two different observation data structures for the two settings.

Actually you can have both at once, it's not an either/or.

I'm pretty confident that you can easily do that in OpenAI too.

  • Not sure about this one, but we definitely wanted to cover the case of training multiple agents, wasn't sure if Gym focused on that?

I am really interested in this feature of your framework too. I am firmly convinced that the emergence of intelligence can be assessed by evaluating how artificial agents are able to interact and coexist with other agents in the same environment. Unfortunately, I've not any experience in multi-agent training in OpenAI yet but I've recently seen that they've integrated the Robotschool https://blog.openai.com/roboschool/ in which you can do exactly that. Maybe can be interesting to understand to what extent the implemented environment can be used to support yours?

Yes, I didn't get this from that main tutorial https://gym.openai.com/docs (looks like only one agent acting) but I didn't look at Robotschool..

  • Also didn't see in Gym Hogwild and Batch support, but I guess can be added.

I think that Batch support is already covered by this issue https://github.com/openai/gym/issues/185. For Hogwild, I'm not able to express an opinion because I don't know the algorithm in detail.

Ok, don't know Gym well.. is there a tutorial for that? The issue link didn't clear it up for me.

Having said all that, having dialog in e.g. a 3D gaming world is obviously a cool thing, and maybe integration with Gym going forward would be good for things like that?

This is exactly my long-term goal. Design an agent able to complete Universe environments https://universe.openai.com/ in which it should communicate with other agents in order to achieve its objectives. Moreover, in a 3D world, an agent should learn how to interact with the environment and what is the effects of its actions on it. This kind of interaction between language and direct manipulation in 3D world is something that really fascinates me.

Again, I want to thank you for your great answer and I'm looking forward to see a prospective plan to link the two platforms. In my opinion there are a lot of interesting advantages in doing that. Maybe people from OpenAI can be interested in this project too. I remember that @hans https://github.com/hans is interested in a similar project because we've briefly discussed about it on Twitter some time ago.

Best regards, Alessandro

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/95#issuecomment-303855970, or mute the thread https://github.com/notifications/unsubscribe-auth/AKjk-CttiU8zBbU5O7kUpY5DtWpDuG7hks5r9J_0gaJpZM4Nhnzo .

aleSuglia commented 7 years ago

Yes, I saw the info field but I thought this would be more of a hack to fit what we really wanted into their framework rather than it being a core part of the framework. I think if we do add some kind of thing that wraps ParlAI in Gym though we can do things like that.

I agree with you, it is not the cleanest solution after all. Maybe is there a solution to unify the two frameworks?

Yes, I didn't get this from that main tutorial https://gym.openai.com/docs (looks like only one agent acting) but I didn't look at Robotschool..

As I said before it's pretty new and I think that is worth to have a look at it. Maybe there is something interesting that can help in order to understand how to bridge the gap between the two frameworks in this sense.

Ok, don't know Gym well.. is there a tutorial for that? The issue link didn't clear it up for me.

Unfortunately, there isn't something official. In the last comment of the issue you can see that an environment defines different modes. In the current release, this is the proposed solution:

Resolution is: don't call render("human") is training. That's all you need.

So when you set the mode to "human", the environment should return an observation that a human can easily interpret. By default, batch of observations can be returned. Obviously, the programmer should configure the environment to support that two modes. For instance, I've seen what they do in the Mujoco environment.

ethancaballero commented 7 years ago

Also didn't see in Gym Hogwild and Batch support, but I guess can be added.

Here's an example of Hogwild with OpenAI Gym: https://github.com/ikostrikov/pytorch-a3c

We wanted to have a very clear consistent message passing object with fixed fields (the observation/action dict). I guess we could do that in Gym by dumping everything in observation, but it doesn't really have those fields?

OpenAI Universe has protocols for syncing text fields with environment frames/data: https://github.com/openai/universe/blob/02cbf092c1f0ba84547f93a7b4eb57e3a183c868/doc/protocols.rst#envtext

ethancaballero commented 7 years ago

With regards to openai gym/universe, I think ParlAI would first be most useful as means for gathering/organizing/serving (via ParlAI’s tools such as Mturk) transcripts of only humans communicating to acheive goals in a gym environment (with some non-linguistic components) whose frames/states are recorded in sync with humans’ communications & actions. The synced human transcripts & gym recordings would then be then be used to train the Neuralese<—>English semantic belief translators (English interoperabilities) of ML agents that invent their own language while communicating with only themselves to achieve goals (in a different instance of the same gym environment that the humans were recorded in) as outlined in @jacobandreas's “Translating Neuralese” https://arxiv.org/abs/1704.06960

jaseweston commented 7 years ago

Ah, ok, thanks much for the links! Yes, I definitely see in the future that somehow dialog has to be grounded more ;) ParlAI definitely does not solve ALL problems (yet!), but it does try to gather dialog resources in one place, and put the emphasis on dialog first. However, integration with other things at some stage looks very interesting indeed..

On Thu, May 25, 2017 at 5:00 AM, Ethan Caballero notifications@github.com wrote:

With regards to openai gym/universe, I think ParlAI could be most useful as means for gathering/organizing/serving (via ParlAI’s tools such as Mturk) transcripts of only humans communicating to acheive goals in a gym environment (with some non-linguistic components) whose frames/states are recorded in sync with humans’ communications & actions. The human transcripts and gym recordings would then be then be used to train the Neuralese<—>English semantic belief translators of ML agents that invent their own language while communicating with only themselves to achieve goals (in a different instance of the same gym environment that the humans were recorded in) as outlined in @jacobandreas https://github.com/jacobandreas's “Translating Neuralese” https://arxiv.org/abs/1704.06960

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/95#issuecomment-303960146, or mute the thread https://github.com/notifications/unsubscribe-auth/AKjk-PK8r8EN29sHqgU8IybjbJ8692AGks5r9UMngaJpZM4Nhnzo .

jaseweston commented 7 years ago

closing for now, but we can definitely come back to this.