NERSC / podman-hpc

Other
38 stars 10 forks source link

Bug in handling container arguments #112

Open shubhe25p opened 2 months ago

shubhe25p commented 2 months ago

Hi everyone, I found a bug while testing N10 LAMMPS in podman-hpc

Image: localhost:/n10-lammps:1.0

Run script:

podman-hpc run --gpu --mpi localhost/n10-lammps:1.0 /opt/lammps/install/bin/lmp  -k on g 4 -sf kk -pk kokkos newton on neigh half -in /opt/exaalt/benchmarks/common/in.snap.test -var snapdir  /opt/exaalt/benchmarks/common/2J8_W.SNAP -var nx 256 -var ny 256 -var nz 256 -var nsteps 1

Error:

Traceback (most recent call last):
  File "./podman-hpc", line 11, in <module>
    load_entry_point('podman-hpc==1.1.0', 'console_scripts', 'podman-hpc')()
  File "/global/homes/s/shubhp/work/exaalt/container/podman-build/usr/lib/python3.6/site-packages/podman_hpc/podman_hpc.py", line 445, in main
    podhpc(prog_name="podman-hpc")
  File "/usr/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/decorators.py", line 64, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/global/homes/s/shubhp/work/exaalt/container/podman-build/usr/lib/python3.6/site-packages/podman_hpc/podman_hpc.py", line 398, in call_podman
    _shared_run(siteconf, podman_args, **site_opts)
  File "/global/homes/s/shubhp/work/exaalt/container/podman-build/usr/lib/python3.6/site-packages/podman_hpc/podman_hpc.py", line 263, in _shared_run
    valid_params = cpt.filterValidOptions(list(run_args), cmd)
  File "/global/homes/s/shubhp/work/exaalt/container/podman-build/usr/lib/python3.6/site-packages/podman_hpc/click_passthrough.py", line 215, in filterValidOptions
    unknowns = p.parse_known_args(valid_options)[1]
TypeError: 'NoneType' object is not subscriptable

Reason for this error: When container arguments are parsed, podman options are filtered out but we never take overlapping into account i.e -i podman is a boolean flag but for lammps means input file so it expects a value, thus the error.

Solution: Disable filtering of container arguments with podman options

Test program to replicate this error:

#!/usr/bin/python3
import argparse

# Create an ArgumentParser object
p= argparse.ArgumentParser()

flags=["-i", "--interactive"]
valid_options=['localhost/n10-lammps:1.0', '/opt/lammps/install/bin/lmp', '-k', 'on', 'g', '4', '-sf', 'kk', '-pk', 'kokkos', 'newton', 'on', 'neigh', 'half', '-in', '/opt/exaalt/benchmarks/common/in.snap.test', '-var', 'snapdir', '/opt/exaalt/benchmarks/common/2J8_W.SNAP', '-var', 'nx', '256', '-var', 'ny', '256', '-var', 'nz', '256', '-var', 'nsteps', '1']
p.add_argument(*flags, action="store")
# Parse known arguments
try:
    out =  p.parse_known_args(valid_options)[1]
    print(out)
except SystemExit as e:

    print(f"Error: {e}")
    print("Invalid arguments provided. Please check the usage.")

Regards, Shubh PEM intern, NERSC

dingp commented 2 months ago

I cannot replicate the issue on my end.

Here is what I did:

  1. Create a simple script named test_input_arg_value.sh;
  2. Run podman-hpc run --rm -v $PWD:/test ubuntu:latest /test/test_input_arg_value.sh -i "this is my input_file"
> cat test_input_arg_value.sh
#!/bin/bash

input_arg_value=$2
echo $input_arg_value

> podman-hpc run --rm -v $PWD:/test ubuntu:latest /test/test_input_arg_value.sh -i "this is my input_file"
this is my input_file