Discussion of Reinforcement Learning (RL) environment using Python-Coq API

brando90 commented 5 years ago

thread for ideas and discussions for RL env. The standard/common way in ML is to use OpenAIs Gym (https://gym.openai.com/). Currently, I am unsure if that is what we want for our environment (e.g. cuz we might want to execute the checking/verification of current code for the learner and the learner could receive the negative/positive reward at any point). I am also unsure how the coq-lsp (or "reactive proof checking) will affect the RL environment.

I outlined preliminary ideas here: https://github.com/brando90/coq-serapi-python/issues/2#issuecomment-469786616

Currently, for the first prototype I will have the agent behave like a video game only receiving at the end once the game has ended. e.g. have the RL agent write tactics etc and when it adds Qed. the environment is done and the checking/verification is done to give the agent the reward.

ejgallego commented 5 years ago

Very cool, thanks Brando, it seems like a good plan. My point of view wrt coq-lsp is that here the "master" is the RL code, thus we will make sure Coqs code is providing the right API for it.

We have a bit of turmoil upstream due to some changes to accommodate the new document manager , but IMO that should converge soon and be pretty stable afterwards.

brando90 commented 5 years ago

Hey Emilio!

I want to have the states in the current RL game to be State=(context,goals) or some variant. For that I’d like to get all the building components or atoms/vocabulary of any Coq term. Is there an easy way to get that information from SerAPI (all the atoms for building Coq Terms)?

For context, in natural language processing (NLP) each atom is just a dictionary word like cat and we have a fixed vocabulary at the beginning.

ejgallego commented 5 years ago

Hi Brando!

You can indeed get some information from SerAPI [thu not all, but the one missing is usually hardcoded and doesn't ever change]

For goals, what you have is the "kernel representation", that be queried with (Query () Goals).

Note that Coq has two levels of terms, "kernel terms", which are used to typecheck, and the AST. For example EGoals will try to infer an AST, which is usually more friendly for humans.

Then, apart from terms, you have "tactics", which can be queried with the tactic query as they are dynamic. Then the vocabulary stems from the grammar rules for each tactic, documented in the manual and also available from SerAPI.

brando90 commented 5 years ago

I wished it would have occurred to me to ask you this Thursday, have patience with me while I try to articulate my question in text :)

What I am really trying to implement is the following lines from the Gamepad paper on section 5 (https://arxiv.org/abs/1806.00608):

The leaves of the AST consist of constants, inductive types, constructors, existentials, and variables. Each constant, inductive type, constructor, and existential is given an entry in a learnable embedding table, which encodes the global state.

essentially what I'd like to be able to do is traverse the AST and each time I hit a meaningful node be able to know what symbol it is by looking up in my big atoms/vocabulary which symbol it is. So if the first node is cat then n = table['cat'] tells me which position cat is in the vocabulary sorted by alphabetical order.

If that isn't clear perhaps I can try again. Imagine I have n+n = n as the goal. At least on the LHS of the inequality I'd imagine the AST is a binary tree with + as the root and n's as the nodes. Assuming I can traverse that expression I'd be able to know the expression means by its structure in the Tree and the values of the nodes. What I am really trying to figure out is what each node means with respect to the building blocks of Coq and the way I process them is indicated by the AST.

Does this make any sense? I feel maybe the disconect for me is that I am thinking about how sentences are represented in natural language processing (NLP) and trying to map that to how Coq represents terms so I can processes them the same way.

PS: please feel free to clarify, I'm doing my best here :)

ejgallego commented 5 years ago

Sure, it does make sense. The AST for this case is (in kernel notation):

(App
 (Ind
  (((Mutind (MPfile (DirPath ((Id Logic) (Id Init) (Id Coq))))
            (DirPath ()) (Id eq))
    0)
   (Instance ())))
 ((Ind
   (((Mutind (MPfile (DirPath ((Id Datatypes) (Id Init) (Id Coq))))
             (DirPath ()) (Id nat))
     0)
    (Instance ())))
  (App
   (Const
    ((Constant (MPfile (DirPath ((Id Nat) (Id Init) (Id Coq))))
               (DirPath ()) (Id add))
     (Instance ())))
   ((Var (Id n)) (Var (Id n))))
  (Var (Id n)))))

seems pretty straightforward, doesn't it?

ejgallego commented 5 years ago

Beautifying a bit:

 (Ind eq)
 ((Ind nat)
  (App
   (Const add)
   ((Var (Id n)) (Var (Id n))))
  (Var (Id n))))

which turns out to be eq nat (+ n n) n

brando90 commented 5 years ago

fabulous! Thanks soo much, I think I can have the agent parse that.

Now the missing thing. What is the SerAPI command to get the list of atoms/vocabulary, something like:

( Query (Give Brando All the Atoms in a List) )

and then I get:

[Nat, Successor, Add, Multiply, Equality, ...]

With that, I think I am ready to do this! :)

ejgallego commented 5 years ago

Umm, what do you mean by atom?

ejgallego commented 5 years ago

so what would the output be for example for the above?

brando90 commented 5 years ago

In NLP they would be the either characters like a,b,c,d,... or words 'cat, dog, ...'.

In Coq, I'd assume they are just the building blocks for building Coq terms, my guess is they are constants, inductive types, constructors, existential, variables...something like that.

brando90 commented 5 years ago

so what would the output be for example for the above?

So for eq nat (+ n n) n the equality =, the add +, the type nat, the id of the variable (i.e. the character n). Does that make sense?

ejgallego commented 5 years ago

Sure, so that's your feature extraction, right?

We can add a generic extraction mechanism, or it can also be done very easily in python, I guess it boils down to our preference for ocaml vs py

brando90 commented 5 years ago

Ok so serapi doesnt give me all the atoms already. Thats ok, Im fine with whatever! I can try building it :)

Sent from my iPhone

On Mar 16, 2019, at 4:03 PM, Emilio Jesús Gallego Arias notifications@github.com wrote:

Sure, so that's your feature extraction, right?

We can add a generic extraction mechanism, or it can also be done very easily in python, I guess it boils down to our preference for ocaml vs py

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ejgallego commented 5 years ago

You mean the list of all atoms in the goal or the list of all available atoms in scope?

brando90 commented 5 years ago

In scope

Sent from my iPhone

On Mar 16, 2019, at 4:16 PM, Emilio Jesús Gallego Arias notifications@github.com wrote:

You mean the list of all atoms in the goal or the list of all available atoms in scope?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ejgallego commented 5 years ago

In scope

then filtering the Ind, Var, and Const constructors should mostly do the work.

brando90 commented 5 years ago

Sorry if this is a really dumb question, but is that a serapi query command?

Sent from my iPhone

On Mar 16, 2019, at 8:49 PM, Emilio Jesús Gallego Arias notifications@github.com wrote:

In scope

then filtering the Ind, Var, and Const constructors should mostly do the work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ejgallego commented 5 years ago

Any command wrapped in (Query () cmd)

brando90 commented 5 years ago

sorry for being dense (Im starting to today to learn how to read MLI files so I can understand the protocol more indepedently). I tried the following but didn't quite work:

( Query ( (preds (Prefix “Const”) ) (limit 10) Env )
( Query ( (preds (Prefix “Ind”) ) (limit 10) Env )
( Query ( (preds (Prefix “Const”) ) (limit 10) Env )

suggestions?

ejgallego commented 5 years ago

I tried the following but didn't quite work:

Yup, that won't work as it is not implemented; but it is a good point; we should make it work. Note however that in this case Const and Ind are supposed to be filtered out by the client from the kernel-level structure.

Bear with me (and SerAPI) as the Query API has been little used so far, so you will find a lot of trouble. For example if you would like to filter consts and inds from a kernel term we could indeed implement that, however that's fairly easy to do in Python too.

brando90 commented 5 years ago

@ejgallego I am happy to wait! I am also happy to implement it myself in python if I can :), Im just unsure what I need to do to get the whole vocabulary/atoms from Coq. Is there a way for SerAPI to tell me everything and then I loop through it myself in python according to some criterion and filter it myself?

Thanks for your patience with my questions :)

brando90 commented 5 years ago

notes for myself:

How does one Query for the context:

for the goal local context you query using goals; it is on the hyps field.

For the global context:

For the global one you can use the Env query but it is usually way large, Emilio is working on a more principled search that should appear later :) .

ejgallego commented 5 years ago

@brando90 it would greatly help if you could provide more concrete examples on what you need.

For example, "context" is too generic; it'd be better to say: I have this lemma, the agent is here, and this is the information I need.

brando90 commented 5 years ago

@ejgallego my apologies, I didn't know I was being vague. I meant the context as in:

1 subgoal (ID 10)

  n : nat
  ============================
  0 + n = n

as in the output that would appear in prove general as Coq Tactics are executed.

The global context (which I am assuming is the thing I need to get all the current atoms) would be nice to get eventually but I think its not urgent right now.

ejgallego commented 5 years ago

hey @brando90 no need to apologize at all please :) But indeed context means many things in Coq [you can look at the misc get_context_foo functions :)]

So the context belonging to a particular goal, it is usually dubbed "goal context", and indeed, it is the hyps part in the CoqGoal object.

Note that hyps also include section variables.

I was confused because indeed, imagine this file:

Lemma foo1 n : P n.
Proof. ... Qed.

Lemma foo2 m : P m.
Proof. ... Qed.

Indeed, when in foo2, the goal context will only contain m, but however foo1 may be used to prove it too.

More generally, I think that for a file such as:

Require Import mod1 ... modn.

Lemma lem1 ...
...
Lemma lemn ...

Goal forall local1 ... localn , P ...

we have indeed 3 kind of contexts: the goal-local one, the file (or module) local one, and the really global one in mod1 ... modn.

So indeed the space is large, and it is hard to predict what kind of rewards we will get; just doing this experiment would be IMHO a nice paper already.

I'd say that the agent should filter somehow the context, likely the local one is always worth trying, but both for the file local and global one, we should provide some kind of fitlering w.r.t. the goal.

Humans do that all the time, if am trying to prove something about sum we do Search _ (\sum _ <- _ _ + _) [note the holes]

brando90 commented 5 years ago

@ejgallego what project did you have in mind for this:

So indeed the space is large, and it is hard to predict what kind of rewards we will get; just doing this experiment would be IMHO a nice paper already.

did you mean experimenting with the agent filtering things from the different contexts and see how the amount of things it proves change?

Also let me give you more context and the reason I am trying to extract the proof contexts. This is because I want to do something similar to what they did in Gamepad (check figure 1 and especially section 5). So for us, at least for now, the agent receives a Proof State and is suppose to choose a tactic based on the Proof State. The proof state Gamepad implies is:

ProofState = (ProofContext, Goals)

I thought that made sense. The selection of actions based on local context or global context feels more a problem of action selection rather than how to represent the state. I am not quite there yet where I am thinking how to filter the action space (or synthesize terms), but of course I want to get there ASAP and its important. I feel first I need to represent the state the agent is at and assume a fixed set of tactics work for simple proofs. So first step:

action = Agent(ProofState,FixedTactics)

ejgallego commented 5 years ago

did you mean experimenting with the agent filtering things from the different contexts and see how the amount of things it proves change?

I was curious how the rewards would look like wrt locality / neareness.

to do something similar to what they did in Gamepad

Indeed I need to catch up with gamepad, your goal of "reimplementing" most of it seems like very sensible to me. I hope to catch up soon, I am a bit busier than expected with the job season sorry :/

Maybe tomorrow we could over the gamepad paper a bit if there is time?

So first step

Indeed sounds good!

brando90 commented 5 years ago

I summarized the gamepad contributions here:

https://github.com/brando90/coq-serapi-python/issues/1

brando90 commented 5 years ago

Might be nice to leave this here since its highly related:

https://github.com/princeton-vl/CoqGym

Perhaps I can soon check it out and outline a small summary here for us to decide how it compares to our design etc.

https://arxiv.org/abs/1905.09381

ejgallego commented 5 years ago

Great thanks!

brando90 / coq-serapi-python

Discussion of Reinforcement Learning (RL) environment using Python-Coq API #3