Closed drscotthawley closed 3 years ago
If I take out the "_file" in the key names where self.examples.append()
is called, then I don't get the error, and instead the training run proceeds to
Validation sanity check: 0%
0/2 [00:00<?, ?it/s]
...and then it runs out of CUDA memory for me. Only running on a GTX 3080 with 10GB of VRAM. Maybe I can decrease the batch size, or else switch to a different GPU.
Anyway, I suspect that changing the keys from "input_file" to "input" and "target_file" to "target" was the right move.
Hey Scott, Thanks again for checking this out!
Unfortunately this notebook is quite out of date with the main codebase which is likely why you are running into these errors. Sorry to throw you on a wild goose chase. I would recommend running the examples/compressor/train.sh
script, which has all the hyperparameters we used in our experiments. We used a batch size of 128 here, which requires ~14GB of VRAM, so you will likely need to bump that down in the script.
If you run into any more issues please let me know. Also, we are about to release another repo focused just on modeling the LA-2A with updated models and training code, so keep a look out for that in the next few days.
Christian, this is all super cool. I look forward to being able to run this on my new GPU.
When I run the notebook, it dies at the line,
by saying the Key "input" is not found. Looking further up in the code where
self.examples.append(
is called, it looks like you use the key "input_file" instead of "input". Could this be the source of the problem?Full log follows: