How much RAM is needed for training on ASNQ Dataset ?

amazon-science / wqa_tanda

This repo provides code and data used in our TANDA paper.

Other

108 stars 26 forks source link

How much RAM is needed for training on ASNQ Dataset ? #2

Closed 0xsimulacra closed 3 years ago

0xsimulacra commented 4 years ago

Hello,

I tried to re-do the same step that you describred in project page to create the weight for the transfer phase, Meaning training on the ASNQ dataset but it use too mych RAM. I have 60 GB RAM and it isn't enough to get the model to run. It get stuck on transformers.data.processors.glue creating features for training consuming more and more memory for creating features on the train dataset. Any work around for this problem ?

Thank you.

vuthuyfo commented 4 years ago

Hi, can you provide more debugging data for the error, incl. the stack trace and instance type?

SchenbergZY commented 4 years ago

same problem. but I finally got 20377568 examples...

vuthuyfo commented 4 years ago

hi SchenbergZY, which instance did you use? I can try test it. Thanks.

0xsimulacra commented 4 years ago

Hi, Sorry for responding late. I can't provide the error that I've got right now but the problem comes from the hugging face library that need to process and load all the data as arrays for Pytorch or TensorDataset for Tensorflow in the RAM before the training starts wich isn't effcient at all. I've onyl managed to do counter this by transforming thr part fo the code responsifble fo doing this to just process the data and load to the GPU only when needed using Tf.dataset and reading from csv file on the go.

sid7954 commented 3 years ago

[Closing issue due to inactivity, feel free to reopen if unresolved]

A quick fix might be to split the training file of ASNQ into multiple smaller files and sequentially read them in to create the training features.