@DannyWeitekamp and I ran into an issue with done button presses in CTAT, which canonically send a -1 as the input to their actions. In agents with some kind of math knowledge this triggers a how search over why the input is -1, which can take a long time to learn something arbitrary and incorrect. Further, when it comes time to request a done button action the apprentice might spit out some arbitrary input value based on whatever nonsense it learned in training that won't be accepted by default. Currently we are handling this by altering the actions that come out of the apprentice before we pass them on to CTAT but it would be great if we didn't have to do that.
Here are some ideas for fixing this that I've thought of:
Develop some kind of notation for a field that is intentionally constant and thus doesn't need to be explained. If, for example, CTAT chose to encode button press inputs with some arbitrary string then it would fall out of how search and be learned as a constant. If we know it is going to fall out of search anyway it might be nice to have some flag to save on processing. One possible counter argument is that this might be cheating in that the way a state is encoded as strong implications on processing.
Simply pass no inputs for times that the input field is unnecessary. In principle this is currently possible, at least its not something we check for in verifying a training instance. That said I have no idea what the result would be. Maybe we don't need to do anything and we just need to fix an arbitrary notation problem on the interface side when we work with CTAT.
@DannyWeitekamp and I ran into an issue with done button presses in CTAT, which canonically send a -1 as the input to their actions. In agents with some kind of math knowledge this triggers a how search over why the input is -1, which can take a long time to learn something arbitrary and incorrect. Further, when it comes time to request a done button action the apprentice might spit out some arbitrary input value based on whatever nonsense it learned in training that won't be accepted by default. Currently we are handling this by altering the actions that come out of the apprentice before we pass them on to CTAT but it would be great if we didn't have to do that.
Here are some ideas for fixing this that I've thought of: