Standardization - Githubissues

Zac1164 commented 10 years ago

Move types are of the form black-to-live or white-to-kill (colors flipped if otherwise). Use formatMove() to get correct color/location of stone in ground truth data; the raw data remains unchanged! If no problem type is given (or recognized), an error is thrown.

Board is standardized so min{startingX,solutionSpaceX} = 0, min{startingY,solutionSpaceY} = 0, maxX = max{startingX,solutionSpaceX} - minX, maxY = max{startingY,solutionSpaceY} - minY Note: this assumes we know THE SOLUTION DIMENSIONS (i.e. we are given a puzzle where the solutions must fit in the space given), if we don't want to assume this, we can change it, but might have to work with a board that can grow in size from one state to the next or always scale the board an additional n spaces.

depthfirst commented 10 years ago

OK, I'm struggling with the whole white vs. black thing. How are we going to use the data for training? I was thinking an easy way to homogenize is to flip the colors when extracting features if it's white's turn. Still have to figure out if a move is "good" or "bad", however.

If we have a white-to-kill solution, what can we assume about the moves for black? Can we assume they are always correct? And the moves for white are only right if they are in one of the solutions?

Similarly, if it's black-to-live, would white moves be correct, and black moves depend on the solutions?

Zac1164 commented 10 years ago

I don't see any harm in flipping the colors when we switch players. It makes sense to me.

My thinking was the same as yours: when it's black-to-live/white-to-kill and we care about accomplishing said task, only use the moves from the solution as the correct class and the ones belonging to the other paths become the incorrect class because they lead to failure states. It becomes trickier for the other player, though because, in theory, he/she should always be playing optimally against the "protagonist". I say we try labeling moves that result in the protagonist failing as the correct class and the other moves along the solution paths are the incorrect class. I worry about imbalance and watering down the real correct moves, though. If it doesn't work, we'll have to modify our thinking, I don't think it should be too difficult to make the code modifiable for these purposes (namely including a set of flags for which training method to run).

Alternatively, we can see how many instances of black-to-live vs white-to-kill we have in the data set, and only train on correct paths for the corresponding colors, but I'm getting really worried about not having enough useful data as is.

depthfirst commented 10 years ago

I like your last idea as a starting point. When you add all the moves from the whole tree, there end up being a lot of instances. Using just the moves for the color in question, including incorrect paths as "bad" moves, adds up to 31k instances in the training set.

depthfirst commented 10 years ago

I noticed that the unconditional life test produces different results with a cropped board than if I leave it uncropped. I suspect we're eliminating too much of the board. For example, if stones are not on edges in the problem, they probably shouldn't be on edges on the reduced board size, right? That changes the liberties, unconditional life test, and maybe other features. But there are cases where this happens.

MrTyton commented 10 years ago

That is true. The cropped board should have a border of at least 1 on each non-edge side that's full of empty spaces. If there are moves that are placed there it will affect things... probly could change that by increasing it to 2? Or maybe making a cropped_board class?

On Mon, Dec 1, 2014 at 12:51 PM, John Blackmore notifications@github.com wrote:

I noticed that the unconditional life test produces different results with a cropped board than if I leave it uncropped. I suspect we're eliminating too much of the board. For example, if stones are not on edges in the problem, they probably shouldn't be on edges on the reduced board size, right? That changes the liberties, unconditional life test, and maybe other features. But there are cases where this happens.

— Reply to this email directly or view it on GitHub https://github.com/RU-CS530-Go-Team/GoLD/issues/12#issuecomment-65105172 .

depthfirst commented 10 years ago

This may be an easy fix. Currently, there's no attempt to preserve space around the edges, it's just cropped to the stones: self.boardDimensions['xMax'] = max(xList) self.boardDimensions['xMin'] = min(xList) self.boardDimensions['yMax'] = max(yList) self.boardDimensions['yMin'] = min(yList)

I'll add/subtract one from each, unless it's already at the edge. Should do the trick.

MrTyton commented 10 years ago

Maybe do 2, just to be safe

On Mon, Dec 1, 2014 at 1:38 PM, John Blackmore notifications@github.com wrote:

This may be an easy fix. Currently, there's no attempt to preserve space around the edges, it's just cropped to the stones: self.boardDimensions['xMax'] = max(xList) self.boardDimensions['xMin'] = min(xList) self.boardDimensions['yMax'] = max(yList) self.boardDimensions['yMin'] = min(yList)

I'll add/subtract one from each, unless it's already at the edge. Should do the trick.

— Reply to this email directly or view it on GitHub https://github.com/RU-CS530-Go-Team/GoLD/issues/12#issuecomment-65112417 .

depthfirst commented 10 years ago

Just pushed. Regenerating CSV files since features will change. Need to rebuild models.

RU-CS530-Go-Team / GoLD

Standardization #12