we get that there are only 1.254.409 millions unique transformations out of the 3.405.187 transformations extracted which means that 2/3rds of the transformation rules are duplicated.
This poses a problem later when training the RolloutPolicyNet and ExpansionPolicyNet because in theory we should have for each rule in the RolloutPolicyNet at least 15 samples, however, because most of those samples are duplicated we get in reality 1 or 2 samples per rule which is not enough for training.
Here I plotted the histogram of the number of samples per rule and we can see that for most rules we only have one sample:
In extract_templates.py, in line 280 if we add
we get that there are only 1.254.409 millions unique transformations out of the 3.405.187 transformations extracted which means that 2/3rds of the transformation rules are duplicated.
This poses a problem later when training the RolloutPolicyNet and ExpansionPolicyNet because in theory we should have for each rule in the RolloutPolicyNet at least 15 samples, however, because most of those samples are duplicated we get in reality 1 or 2 samples per rule which is not enough for training.
Here I plotted the histogram of the number of samples per rule and we can see that for most rules we only have one sample:
Any thoughts on this issue?