THUDM / SwissArmyTransformer

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
https://THUDM.github.io/SwissArmyTransformer
Apache License 2.0
978 stars 92 forks source link

不支持流式dataset #138

Closed af-74413592 closed 9 months ago

af-74413592 commented 1 year ago

visualglm只有FewshotData,数据直接加载到内存中会爆掉,改成 large_dataset_streamed = load_dataset("json", data_files=path,split="train", streaming=True) dataset = large_dataset_streamed.map(datapreprocess) 的形式后,发现也不支持流式dataset。

1049451037 commented 1 year ago

支持流式,只需要在训练脚本里传入参数--iterable-dataset

af-74413592 commented 1 year ago

谢谢