askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
362 stars 78 forks source link

Normal training time? #2

Closed shuyanzhou closed 4 years ago

shuyanzhou commented 4 years ago

Hi,

I am trying to walk through the project and retrain the model by myself.

My machine is GeForce RTX 2080ti with 11GB memory. The batch size is set to 4 (I tried 8, but it run into OOM after half epoch, I guess there are some very long data sequences.)

It takes more than 3 hours to finish an epoch. Is this normal?

MohitShridhar commented 4 years ago

@shuyanzhou yep, it does take a while. We trained on P100s (16GB memory) with a batch size of 8. Each epoch took around ~1.5 hrs, and the full training took ~1.5 days. And yes, this is mostly because the sequences are very long - avg. 50 steps x ~21k training trajectories

shuyanzhou commented 4 years ago

@MohitShridhar Thanks for your reply Mohit. Maybe I miss it, what is the condition of stoping training? The default epoch is 200, which will take very long to complete.

MohitShridhar commented 4 years ago

Well, typically 10-15 epochs is sufficient. I changed the default to 20. Thanks for the pointer :)

shuyanzhou commented 4 years ago

Awesome. Thanks for your prompt reply!