EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
https://www.eleuther.ai/
Apache License 2.0
6.81k stars 988 forks source link

Can't use train.py when training models #714

Closed Codingrocks3 closed 1 year ago

Codingrocks3 commented 1 year ago

I am trying to train the model and when I run: python ./deepy.py train.py ./configs/small.yml ./configs/local_setup.yml I get the error: NeoXArgs.from_ymls() ['./configs/small.yml', './configs/local_setup.yml'] Traceback (most recent call last): File "./deepy.py", line 28, in <module> neox_args = NeoXArgs.consume_deepy_args() File "/home/masonholly/gpt-neox/megatron/neox_arguments/arguments.py", line 332, in consume_deepy_args neox_args = cls.from_ymls( File "/home/masonholly/gpt-neox/megatron/neox_arguments/arguments.py", line 204, in from_ymls return cls(**config) File "<string>", line 190, in __init__ File "/home/masonholly/gpt-neox/megatron/neox_arguments/arguments.py", line 107, in __post_init__ self.enable_logging() File "/home/masonholly/gpt-neox/megatron/neox_arguments/arguments.py", line 501, in enable_logging os.makedirs(self.log_dir, exist_ok=True) File "/home/masonholly/anaconda3/envs/GPTNeox/lib/python3.8/os.py", line 223, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: 'logs' I don't know how to fix this, I am just learning how to use gpt neox, and thought it would be fun to learn how to train it.

All help is appreciated.

StellaAthena commented 1 year ago

This is a permissions issue with how your computing cluster is set up, and not ultimately about GPT-NeoX.

Codingrocks3 commented 1 year ago

@StellaAthena How would I fix that?

BjornTheProgrammer commented 1 year ago

You could just run python as sudo or use sudo chmod 777 -r <project directory>(assuming the logs file is in the directory). Note that chmod 777 is unsafe as it gives any user the ability to read, write, and execute the file (just revert the permisions later).