After a synthetic proof is generated, how is it used as training data? For example, if a synthetic proof has N auxiliary constructions between the premise statement si and the conclusion statement sN+1, would you make multiple training data entries from this proof by taking every intermediate statement from doing one auxiliary point construction at a time? i.e. you would generate N-1 data entry from this single N-construction proof:
If each single data entry follows this format
<si>< sN+1><auxiliary construction i>
<si+1>< sN+1><auxiliary construction i+1>
<si+2>< sN+1><auxiliary construction i+2>
...
<sN>< sN+1><auxiliary construction N>
where si+1 is the statements after doing auxiliary construction i on <si>
I believe the paper says it made 100M proofs, of which 9M have at least one auxiliary construction. It later also says the fine-tuning used 9M data. How are those proofs with multiple constructions handled? I assume the transformer model only predicts one auxiliary construction at a time. I might have misunderstood this part. Please let me know if I can clarify my questions. Thanks so much for your help.
After a synthetic proof is generated, how is it used as training data? For example, if a synthetic proof has N auxiliary constructions between the premise statement si and the conclusion statement sN+1, would you make multiple training data entries from this proof by taking every intermediate statement from doing one auxiliary point construction at a time? i.e. you would generate N-1 data entry from this single N-construction proof:
If each single data entry follows this format
I believe the paper says it made 100M proofs, of which 9M have at least one auxiliary construction. It later also says the fine-tuning used 9M data. How are those proofs with multiple constructions handled? I assume the transformer model only predicts one auxiliary construction at a time. I might have misunderstood this part. Please let me know if I can clarify my questions. Thanks so much for your help.