AlphaZeroIncubator / AlphaZero

Our implementation of AlphaZero for simple games such as Tic-Tac-Toe and Connect4.
0 stars 0 forks source link

Create utils.py for custom loss function, train and testing functions #13

Closed abhon closed 4 years ago

abhon commented 4 years ago

Create a temporary skeleton for the loss, train, and test function(Will keep working on this tomorrow). Maybe create neural network architecture here?

PhilipEkfeldt commented 4 years ago

I think the model architecture should be in a separate file. I think there's two ways to structure it:

1

Of these, I think 1 is the best. The reason we need different networks for different classes is because of the shape of the input board and the output policy. We could have a generic network class that has the "middle" part and then we just add a start/end to get the right shapes, but I don't think that will be useful as a good architecture depends on the game. I realized this became a pretty long comment, apologies :P

guidopetri commented 4 years ago

I will reply to the rest later, but I want to say: I think we can get away with a single model class that has a general size input and output based on the Game class itself (it should ask the Game class for its input shape, and then the output will be a function of that). I think that's also what @homerours intended.

On Sat, Jun 6, 2020, 4:29 PM Philip Ekfeldt notifications@github.com wrote:

I think the model architecture should be in a separate file. I think there's two ways to structure it: 1

  • Game 1 file
    • Game 1(rules) class, extends generic Game class.
    • Game 1 NN class (extends nn.module)
  • Game 2 file
    • Game 2 (rules) class, extends generic Game class.
    • Game 2 NN class (extends nn.module)

2

  • Game file
    • Game 1 class
    • Game 2 class
  • models file
    • Game 1 NN class
    • Game 2 NN class

Of these, I think 1 is the best. The reason we need different networks for different classes is because of the shape of the input board and the output policy. We could have a generic network class that has the "middle" part and then we just add a start/end to get the right shapes, but I don't think that will be useful as a good architecture depends on the game. I realized this became a pretty long comment, apologies :P

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AlphaZeroIncubator/AlphaZero/pull/13#issuecomment-640120487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOFNOXBZ5VZAUTRWCFJX73RVKYK5ANCNFSM4NVWQ2CQ .

PhilipEkfeldt commented 4 years ago

I will reply to the rest later, but I want to say: I think we can get away with a single model class that has a general size input and output based on the Game class itself (it should ask the Game class for its input shape, and then the output will be a function of that). I think that's also what @homerours intended. On Sat, Jun 6, 2020, 4:29 PM Philip Ekfeldt @.**> wrote: I think the model architecture should be in a separate file. I think there's two ways to structure it: 1 - Game 1 file - Game 1(rules) class, extends generic Game class. - Game 1 NN class (extends nn.module) - Game 2 file - Game 2 (rules) class, extends generic Game class. - Game 2 NN class (extends nn.module) 2 - Game file - Game 1 class - Game 2 class - models file - Game 1 NN class - Game 2 NN class Of these, I think 1 is the best. The reason we need different networks for different classes is because of the shape of the input board and the output policy. We could have a generic network class that has the "middle" part and then we just add a start/end to get the right shapes, but I don't think that will be useful as a good* architecture depends on the game. I realized this became a pretty long comment, apologies :P — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <github.com/notifications/unsubscribe-auth/AEOFNOXBZ5VZAUTRWCFJX73RVKYK5ANCNFSM4NVWQ2CQ> .

That's true, I'm just thinking it might remove flexibility in how the shape change layers are made. No matter what though, this is something we can always change later, and for experiments, all we need to know for now is that the network takes in a tensor of shape board and returns a tensor of shape action. What's inside and what class it is (aside from a subclass of nn.module) doesn't really matter. What I mentioned in the previous comment isn't actually important for now.

abhon commented 4 years ago

Added the pytorch lighting framework to utils.py, so this most likely will just become the model class in the end, as Philip mentioned the framework takes in training, validation, and testing along with the loss function/optimizer and data loading. Only finished optimizer function as that was pretty self explanatory, used values from the paper. Might try to start working on the NN architecture if this looks about right.

guidopetri commented 4 years ago

Not gonna lie, I... don't really get the point of pytorch lightning here. If we're gonna write the entire training/testing framework inside the class anyway, why are we doing it with pytorch lightning and not just in train/test functions?

Apart from that, I'm not really ready to merge this into master until this is a little bit more fleshed out. I think you can just assume that we'll have a nn.Module subclass here and use whatever functions/methods are on there to do the training, validation, testing, etc.

The data loading itself is probably the only part that is kind of tricky, but I'm imagining that - since we're doing these steps asynchronously - we'll just have a folder of "epoch 01" filled with several files using our data storage format (described in #10 - basically a JSON file so far) that each contain a game. You can then load those in as you go using a torch.utils.data.DataLoader of some sort.

guidopetri commented 4 years ago

(PS @abhon : before you do a PR could you run the files through black or flake8? I know we're still deciding on which one to use but having it run through at least one of them would be great)

abhon commented 4 years ago

Am running black for now, I couldn't get flake8 to work with sublime oddly - will look into it if we choose to use flake8

PhilipEkfeldt commented 4 years ago

Not gonna lie, I... don't really get the point of pytorch lightning here. If we're gonna write the entire training/testing framework inside the class anyway, why are we doing it with pytorch lightning and not just in train/test functions?

Apart from that, I'm not really ready to merge this into master until this is a little bit more fleshed out. I think you can just assume that we'll have a nn.Module subclass here and use whatever functions/methods are on there to do the training, validation, testing, etc.

The data loading itself is probably the only part that is kind of tricky, but I'm imagining that - since we're doing these steps asynchronously - we'll just have a folder of "epoch 01" filled with several files using our data storage format (described in #10 - basically a JSON file so far) that each contain a game. You can then load those in as you go using a torch.utils.data.DataLoader of some sort.

What class do you mean in this case? I'm fine with not using it, I just have very good experiences with it as it makes running multiple experiments and having a consistent setup much easier. It basically abstracts away a lot of the boilerplate for setting up training, optimizers, evaluation, logging, and other things when training models.

As for data loading, the data will be in the format (state, tar_policy, tar_v) for each data point, so we would have to expand the current data format in that case. JSON is probably fine, although I'm not sure it is the most efficient way for reading/writing tensors. It might be more efficient to use torch.save (i.e. pickle) or some other binary storage method, and also storing them in batches.

guidopetri commented 4 years ago

I mean the LitModel class that is subclassing something from pytorch lightning.

It doesn't seem to me like it's advantageous at all, frankly. It adds a new dependency to the project and still requires us to write just as much boilerplate code, only in a different format. I'm not certain this helps at all. This being said, I haven't used it in any of my projects before, so if you want to change my mind, feel free to - if it really does improve our experience that much I'll gladly go for it :)

I don't think binary storage vs JSON storage is much different. We're storing pretty small tensors - for most of these games the board size is relatively small at < 100 items, and the policy we're using is also pretty small. Maybe if we went into something like Go or chess we'd have to re-evaluate the policy storing, since the possible move space there is far greater than in connect4/tic-tac-toe. Plus, JSON lets us still read the data as a human, if we want to make sure the board state is getting saved properly.

Storing them in batches might work, but it would be a lot more involved to find specific instances of the data, and you'd probably have to load the files several times over and over again to get your randomized data - negating any speed gains you'd have from storing them in batches. I'm mostly basing this off of the experience we had with the data organization from our 1008 project - iirc it was essentially just one sample per file, organized into folders, as well.

PhilipEkfeldt commented 4 years ago

I mean the LitModel class that is subclassing something from pytorch lightning.

It doesn't seem to me like it's advantageous at all, frankly. It adds a new dependency to the project and still requires us to write just as much boilerplate code, only in a different format. I'm not certain this helps at all. This being said, I haven't used it in any of my projects before, so if you want to change my mind, feel free to - if it really does improve our experience that much I'll gladly go for it :)

I don't think binary storage vs JSON storage is much different. We're storing pretty small tensors - for most of these games the board size is relatively small at < 100 items, and the policy we're using is also pretty small. Maybe if we went into something like Go or chess we'd have to re-evaluate the policy storing, since the possible move space there is far greater than in connect4/tic-tac-toe. Plus, JSON lets us still read the data as a human, if we want to make sure the board state is getting saved properly.

Storing them in batches might work, but it would be a lot more involved to find specific instances of the data, and you'd probably have to load the files several times over and over again to get your randomized data - negating any speed gains you'd have from storing them in batches. I'm mostly basing this off of the experience we had with the data organization from our 1008 project - iirc it was essentially just one sample per file, organized into folders, as well.

Yeah I was thinking about the random sampling, not sure how that would work in that case. My two concerns with how we store data is space and read/write performance. Additionally, I've been thinking about how to parallelize things and I think that will be very tricky. I've been researching how to run MCTS in parallel and how to batch inference requests for the MCTS and I think that will take a while to get working if ever. Running game generation/training/evaluation in parallel will be even tricker I think :D

guidopetri commented 4 years ago

I don't think we ever meant for the generation and evaluation to be in parallel, so... at least we don't have to worry about that haha.

The MCTS should probably be in parallel though. Is this something we'd need multiprocessing for?

On Mon, Jun 8, 2020, 10:12 PM Philip Ekfeldt notifications@github.com wrote:

I mean the LitModel class that is subclassing something from pytorch lightning.

It doesn't seem to me like it's advantageous at all, frankly. It adds a new dependency to the project and still requires us to write just as much boilerplate code, only in a different format. I'm not certain this helps at all. This being said, I haven't used it in any of my projects before, so if you want to change my mind, feel free to - if it really does improve our experience that much I'll gladly go for it :)

I don't think binary storage vs JSON storage is much different. We're storing pretty small tensors - for most of these games the board size is relatively small at < 100 items, and the policy we're using is also pretty small. Maybe if we went into something like Go or chess we'd have to re-evaluate the policy storing, since the possible move space there is far greater than in connect4/tic-tac-toe. Plus, JSON lets us still read the data as a human, if we want to make sure the board state is getting saved properly.

Storing them in batches might work, but it would be a lot more involved to find specific instances of the data, and you'd probably have to load the files several times over and over again to get your randomized data - negating any speed gains you'd have from storing them in batches. I'm mostly basing this off of the experience we had with the data organization from our 1008 project - iirc it was essentially just one sample per file, organized into folders, as well.

Yeah I was thinking about the random sampling, not sure how that would work in that case. My two concerns with how we store data is space and read/write performance. Additionally, I've been thinking about how to parallelize things and I think that will be very tricky. I've been researching how to run MCTS in parallel and how to batch inference requests for the MCTS and I think that will take a while to get working if ever. Running game generation/training/evaluation in parallel will be even tricker I think :D

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AlphaZeroIncubator/AlphaZero/pull/13#issuecomment-641002033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOFNOWEDBMXVUXB2U55XD3RVWSDDANCNFSM4NVWQ2CQ .

PhilipEkfeldt commented 4 years ago

Yeah, something like that. I've been reading through the "lessons learned" blog posts in the GDrive to understand better how they did it.

guidopetri commented 4 years ago

I'm just gonna close this PR and start a new one.