CHTC / templates-GPUs

Template job submissions using GPUs in CHTC
MIT License
39 stars 11 forks source link

LLM template PermissionError #30

Closed lbertge closed 9 months ago

lbertge commented 12 months ago

Hello,

I'm trying to follow the LLM finetuning example listed here https://github.com/CHTC/templates-GPUs/tree/master/llm on CHTC. While I'm waiting to get a request for a /staging directory, I'm temporarily set STAGING_DIRECTORY=/home/<user-id>/... When attempting to run condor_submit run.sub, I'm getting a permissions error in the training script:

Traceback (most recent call last):
  File "train.py", line 88, in <module>
    main()
  File "train.py", line 84, in main
    train(args.run_name, args.use_wandb)
  File "train.py", line 56, in train
    trainer = Trainer(
  File "/transformers/src/transformers/trainer.py", line 558, in __init__
    os.makedirs(self.args.output_dir, exist_ok=True)
  File "/usr/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 1 more time]
  File "/usr/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/<user-id>/'

I can modify my user directory's permissions to allow others to write (chown -R o+w ~) but I'm wondering if there's a more secure way to go about this. Would you happen to know what user is running the job script? Or would it just be easier to wait until my /staging request is approved?

Thank you! Albert

agitter commented 11 months ago

@lbertge if you are still having problems with this, you may get better feedback by emailing chtc@cs.wisc.edu for advice. The CHTC facilitators would be able to tell you more about specific job permissions.