abdulhaim / LMRL-Gym

MIT License
64 stars 9 forks source link

ILQL issue #13

Closed williamd4112 closed 4 months ago

williamd4112 commented 4 months ago

I can train BC and eval BC, but when I run the following command to fine-tune ILQL, I got (possibly) memory error.

python llm_rl_scripts/maze/ilql/train_ilql.py PARAMS bc_checkpoint_path PATH_TO_YOUR_DATA --outputs-path ilql_checkpoint_path

Error:

To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id hohcz2g2.
wandb: Tracking run with wandb version 0.12.18
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.

Killed

I suspect this is because I'm running out of RAM. Any hyperparmeter setting to reduce RAM requirement?

icwhite commented 4 months ago

Hi @williamd4112 you can set "export TOKENIZERS_PARALLELISM=false" before running your script. To reduce RAM constraints, you can decrease batch size and increase grad_accum_steps accordingly. python llm_rl_scripts/maze/ilql/train_ilql.py PARAMS bc_checkpoint_path PATH_TO_YOUR_DATA --outputs-path ilql_checkpoint_path --train-bsize 8 --grad-accum-steps 16

Also if you have multiple GPUs or TPUs you could consider using data parallelism by setting --data-mesh-shape {num_devices}