FlagOpen / FlagScale

FlagScale is a large model toolkit based on open-sourced projects.
Other
132 stars 40 forks source link

[BUG or ENHANCEMENT] about SSH runner #193

Open shh2000 opened 1 month ago

shh2000 commented 1 month ago

In https://github.com/FlagOpen/FlagScale/blob/main/flagscale/launcher/runner.py, there are

with open(host_run_script_file, "w") as f:
        f.write("#!/bin/bash\n\n")
        f.write(f"{before_start}\n")
        f.write(f"mkdir -p {logging_config.log_dir}\n")
        f.write(f"mkdir -p {logging_config.pids_dir}\n")
        f.write(f"\n")
        f.write(f"cd {root_dir}\n")
        f.write(f"\n")
        f.write(f"export PYTHONPATH={vllm_dir or megatron_dir}:{root_dir}\n")
        f.write(f"\n")
        f.write(f'cmd="{cmd}"\n')
        f.write(f"\n")

where vllm_dir or megatron_dir is given by os.path.abspath indirectly. If the user improperly adds a colon (:) symbol in the Linux file system's path, this could lead to incorrect settings for the environment variable.

I advice 2 options:

  1. Warning the users at docs or run.py, if the path of run.py contains colon or other improper symbols, he would face incorrect results.
  2. Split envs and other scrips, using subprocess.run(env=xxx) to set env from python, avoid such problems