Closed 0xsimulacra closed 3 years ago
Hi, can you provide more debugging data for the error, incl. the stack trace and instance type?
same problem. but I finally got 20377568 examples...
hi SchenbergZY, which instance did you use? I can try test it. Thanks.
Hi, Sorry for responding late. I can't provide the error that I've got right now but the problem comes from the hugging face library that need to process and load all the data as arrays for Pytorch or TensorDataset for Tensorflow in the RAM before the training starts wich isn't effcient at all. I've onyl managed to do counter this by transforming thr part fo the code responsifble fo doing this to just process the data and load to the GPU only when needed using Tf.dataset and reading from csv file on the go.
[Closing issue due to inactivity, feel free to reopen if unresolved]
A quick fix might be to split the training file of ASNQ into multiple smaller files and sequentially read them in to create the training features.
Hello,
I tried to re-do the same step that you describred in project page to create the weight for the transfer phase, Meaning training on the ASNQ dataset but it use too mych RAM. I have 60 GB RAM and it isn't enough to get the model to run. It get stuck on transformers.data.processors.glue creating features for training consuming more and more memory for creating features on the train dataset. Any work around for this problem ?
Thank you.