brinckmann / montepython_public

Public repository for the Monte Python Code
MIT License
93 stars 77 forks source link

HTCondor error #282

Closed Amin-83 closed 2 years ago

Amin-83 commented 2 years ago

Hi Thejs,

I have an issue when running MontePython on an HTCondor cluster. Since Condor copies files to a temporary directory then one needs to specify the files that are needed to be copied. The submit script look like this:

universe = vanilla
executable = montepython/MontePython.py
arguments = run -p input/base2018_scf.param -o chains/base2018_scf -N 10000 --chain_number $(Process)
Log = foo.log
error = foo.err
output = foo.out
notification = Never
getenv = true
rank = kflops
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_input_files = VERSION, montepython/
queue

But it seems that Condor gets hung up on a line in parser_mp.py which reads the file VERSION and outputs the version number on screen. Here's the error:

Traceback (most recent call last): File "/local0/condor/lib/condor/execute/dir_710193/condor_exec.exe", line 40, in sys.exit(run()) File "/local0/condor/lib/condor/execute/dir_710193/run.py", line 32, in run custom_command) File "/local0/condor/lib/condor/execute/dir_710193/run.py", line 191, in safe_initialisation cosmo, data, command_line, success = initialise(custom_command) File "/local0/condor/lib/condor/execute/dir_710193/initialise.py", line 31, in initialise command_line = parser_mp.parse(custom_command) File "/local0/condor/lib/condor/execute/dir_710193/parser_mp.py", line 1046, in parse parser = create_parser() File "/local0/condor/lib/condor/execute/dir_710193/parser_mp.py", line 722, in create_parser description='Monte Python, a Monte Carlo code in Python', usage=usage) File "/local0/condor/lib/condor/execute/dir_710193/parser_mp.py", line 314, in initialise_parser with open(os.path.join(path_file, 'VERSION'), 'r') as version_file: IOError: [Errno 2] No such file or directory: '/local0/condor/lib/condor/execute/VERSION'

Am I missing something in the submit script or is it a possible MontePython problem?

Thank you, Amin

brinckmann commented 2 years ago

Hi Amin,

I'm not sure, but it would seem the file isn't getting copied to the right place so the path is wrong? If you look at the error message the other lines all refer to the directory /local0/condor/lib/condor/execute/dir_710193/ while the VERSION line refers to /local0/condor/lib/condor/execute/ Maybe that's normal, but it's all I've got. If you can't work out the directory/file issue, as a last ditch attempt maybe you could just comment the lines loading and printing the version number, as it's hardly an essential feature.

Best, Thejs

Amin-83 commented 2 years ago

Hi Thejs,

Thanks for the quick reply. I did notice this difference. I will try a couple of things and if none work then I will comment out those lines as you say.

Thanks again.

Best, Amin