Open Frankstein73 opened 2 months ago
Hello, thanks for the awesome work you did! Could you please clarify whether ICAE uses the entire Pile dataset during its pretraining phase, or if it only utilizes a subset of it?
We used the entire dataset. However, we only train the ICAE with tens of billion tokens.
Hello, thanks for the awesome work you did! Could you please clarify whether ICAE uses the entire Pile dataset during its pretraining phase, or if it only utilizes a subset of it?