brando90 / isabelle-gym

MIT License
2 stars 0 forks source link

Design first Isabelle pseudo API #2

Open brando90 opened 4 years ago

brando90 commented 4 years ago
brando90 commented 4 years ago

Message from Jason on Zulip about designing APIs (for lean):

As for the Lean API, here is a good article on how to design APIs. (https://blog.keras.io/author/francois-chollet.html) I think one of the main points in that article is the importance of designing end-to-end workflows. It doesn't matter so much how lean is built underneath, but instead how we want the user to interact with the API. To get this conversation started, consider these three workflows:

In typical gym environments (like the atari gym), the workflow is (a) start the environment, (b) get an observation, (c) use that observation to choose an action (from a small list of actions), (d) perform that action, (e) repeat until a timeout or the end state is reached. (This is a linear process with no backtracking.) In a two-player game environments (like go, chess, poker) or in combinatorial puzzle environments (rubik's cube, sobokan), it is necessary to support tree (or graph) search. So the environment has to also include a way to backtrack or to jump to a designated state. To support graph search algorithms smoothly, it is probably better to let the user specify both the board state and the action, getting back a new board state. So the workflow would be: (a) start a new game, (b) observation starting state, (c) select action, (d) use that observation choose which state to explore next and the action for that state (from a small list of actions), (d) perform that state-action pair, (e) observe the result, (e) repeat until timeout or game end reached. Theorem proving adds another component: premise selection. HOList's API requires that the order that the theorems be explored in is linear, but that one can specify the current state (a goal) and the action (a tactic). So the workflow is (a) start up the global environment, (b) grab a theorem to try to prove, (c) add any needed dependencies to the system, (d) explore that theorem just like it is a combinatorial puzzle (see above), (e) add that theorem to the list of dependencies in the HOList system, (f) repeat with the next theorem. I wonder however, if there is not a better way. Maybe something like this. (a) start up the global environment, (b) grab a "problem" to prove (this "problem" will contain both all dependencies one could use---what Jesse calls declarations---as well as the goal to prove, (c) start up a mini-environment for that "problem", (d) explore that problem by making repeated calls with goals and tactic actions, (e) repeat with a new problem. One issue I see with this approach is that the "problem" may be very big. Should one really store or send a giant graph or list of declarations? This could be a bottle neck. However, one way around this would be to register declarations with the system ahead of time and then only send in the ids of the declarations when setting up the "problem". (As I think about this, it is actually getting fairly close to the HOList system already, since one could just register all the theorems ahead of time and the agent can keep track of which declarations are ok to use.)

brando90 commented 4 years ago

@madun1999 let's start with this Eric, why don't you outline the current API for CoqGym here. What do you think? (or outlining an API that we want for IsabelleGym here based on Jason's suggestion above)

brando90 commented 4 years ago

Environments for ML & TP that are good reference for us to build our python Isabelle API:

CoqGym:

HOList:

madun1999 commented 4 years ago

CoqGym API is listed in CoqGymAPI.md in the literature_review branch. A draft of ITP Gym API can also be found there.

madun1999 commented 4 years ago

It looks like Isabelle has a server itself written in Scala. We will need to find a way to communicate with the Scala Isabelle server with python code. CoqGym used coq-SerAPI. We might need something like SerAPI, or a simplified version that meets our needs.

brando90 commented 4 years ago

1) How would a typical RL environment Gym look like for IsabelleGym? What are the advantages and disadvantages? 2) Model how an RL agent works with Isabelle. What are the actions. What are the states. The reward. Where does the proof tree come into the picture? Where does backtracking come into the picture? 3) How does backtracking affect the modeling of an agent (especially an RL agent)? Think about advantages & disadvantages 4) Write out IsabelleAPI using the CoqGym format. Analyze advantages and disadvantages 5) Write out the IsabelleAPI suing the HOList format. Analyze advantages and disadvantages 6) Be familiar with 3 agents and choose which format of environment fits best for each. a) RL (e.g. PolicyGradient) b) Supervised learning (Like ASTact or gamepad) c) Differentiable agent (I will write a document summary or read Heling's Thesis/slides)

from today discussion let have everyone have answers to the above questions & meet amongst you guys first on monday (or sooner) to get 1 compiled answer and let's discuss it by Tuesday next meeting (from X to Y by W)