easybuilders / easybuild-easyblocks

Collection of easyblocks that implement support for building and installing software with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
103 stars 279 forks source link

Tensorflow Fails to Build when LC_ALL=POSIX #2396

Open dithwick opened 3 years ago

dithwick commented 3 years ago

This is on CentOS 8.3 with LC_ALL=POSIX. When running: $ eb TensorFlow-2.4.1-fosscuda-2020b.eb -r I get

== Temporary log file in case of crash /tmp/eb-p5de5y_8/easybuild-ltpl_rm6.log
== found valid index for /scrtp/avon/eb/software/EasyBuild/4.3.4/easybuild/easyconfigs, so using it...
== resolving dependencies ...
== processing EasyBuild easyconfig /home/dugan/gitrepos/easybuild-repos/easybuild-easyconfigs/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.4.1-fosscuda-2020b.eb
== building and installing MPI/GCC-CUDA/10.2.0-11.1.1/OpenMPI/4.0.5/TensorFlow/2.4.1...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== testing...
== installing...
== taking care of extensions...
ERROR: Traceback (most recent call last):
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/main.py", line 117, in build_and_install_software
    (ec_res['success'], app_log, err) = build_and_install_one(ec, init_env)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3404, in build_and_install_one
    result = app.run_all_steps(run_test_cases=run_test_cases)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3303, in run_all_steps
    self.run_step(step_name, step_methods)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3158, in run_step
    step_method(self)()
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/easyblocks/generic/pythonbundle.py", line 128, in extensions_step
    super(PythonBundle, self).extensions_step(*args, **kwargs)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 2242, in extensions_step
    fake_mod_data = self.load_fake_module(purge=True, extra_modules=build_dep_mods)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 1555, in load_fake_module
    fake_mod_path = self.make_module_step(fake=True)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 2958, in make_module_step
    txt += self.make_module_dep()
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 1179, in make_module_dep
    full_mod_subdir, all_deps)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/tools/modules.py", line 1108, in path_to_top_of_module_tree
    modpath_exts = dict([(k, v) for k, v in self.modpath_extensions_for(deps).items() if v])
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/tools/modules.py", line 1040, in modpath_extensions_for
    modtxt = self.read_module_file(mod_name)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/tools/modules.py", line 970, in read_module_file
    modfilepath = self.modulefile_path(mod_name)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/tools/modules.py", line 735, in modulefile_path
    modpath = self.get_value_from_modulefile(mod_name, modpath_re)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/tools/modules.py", line 713, in get_value_from_modulefile
    if self.exist([mod_name], skip_avail=True)[0]:
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/tools/modules.py", line 615, in exist
    mod_exists = mod_exists_via_show(mod_name)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/tools/modules.py", line 551, in mod_exists_via_show
    stderr = self.show(mod_name)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/tools/modules.py", line 698, in show
    ans = self.run_module('show', mod_name, check_output=False, return_stderr=True)
  File "/scrtp/avon/eb/software/EasyBuild/4.3.4/lib/python3.6/site-packages/easybuild/tools/modules.py", line 811, in run_module
    (stdout, stderr) = proc.communicate()
  File "/usr/lib64/python3.6/subprocess.py", line 863, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib64/python3.6/subprocess.py", line 1578, in _communicate
    self.stderr.errors)
  File "/usr/lib64/python3.6/subprocess.py", line 760, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 565: ordinal not in range(128)

The error goes if I run with LC_ALL= $ eb TensorFlow-2.4.1-fosscuda-2020b.eb -r. I've had this problem before such as issue #2393, however I updated the numpy easyblock in that case following advice from @boegel. For this particular bug I'm not sure where the issue is occurring.

boegel commented 3 years ago

@dithwick Can you run this with eb --trace, so we see for which extension the problem is popping up?

Fixing this should be done in the framework BTW, this doesn't seem to be specific to TensorFlow at all, since the TensorFlow easyblock is not popping up in the traceback...

dithwick commented 3 years ago

@boegel Yeah sure. Does that work with the upload test report option or should I just do it manually?

dithwick commented 3 years ago

Not sure if this is quite what you were after but https://gist.github.com/dithwick/9e0a69e6271ee4474cce0ee9f1588b31

dithwick commented 3 years ago

Hi,

I've just had this same problem again but with PyTorch-1.7.1-fosscuda-2020b.eb and at the sanity check stage this time:

== building and installing MPI/GCC-CUDA/10.2.0-11.1.1/OpenMPI/4.0.5/PyTorch/1.7.1...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== configuring...
== building...
== testing...
== installing...
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
ERROR: Traceback (most recent call last):
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/main.py", line 117, in build_and_install_software
    (ec_res['success'], app_log, err) = build_and_install_one(ec, init_env)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3633, in build_and_install_one
    result = app.run_all_steps(run_test_cases=run_test_cases)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3531, in run_all_steps
    self.run_step(step_name, step_methods)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3386, in run_step
    step_method(self)()
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/easyblocks/p/pytorch.py", line 263, in sanity_check_step
    super(EB_PyTorch, self).sanity_check_step(*args, **kwargs)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/easyblocks/generic/pythonpackage.py", line 836, in sanity_check_step
    fake_mod_data = self.load_fake_module(purge=True)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 1561, in load_fake_module
    fake_mod_path = self.make_module_step(fake=True)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3162, in make_module_step
    txt += self.make_module_dep()
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 1181, in make_module_dep
    full_mod_subdir, all_deps)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/modules.py", line 1121, in path_to_top_of_module_tree
    modpath_exts = dict([(k, v) for k, v in self.modpath_extensions_for(deps).items() if v])
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/modules.py", line 1053, in modpath_extensions_for
    modtxt = self.read_module_file(mod_name)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/modules.py", line 983, in read_module_file
    modfilepath = self.modulefile_path(mod_name)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/modules.py", line 748, in modulefile_path
    modpath = self.get_value_from_modulefile(mod_name, modpath_re)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/modules.py", line 726, in get_value_from_modulefile
    if self.exist([mod_name], skip_avail=True)[0]:
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/modules.py", line 627, in exist
    mod_exists = mod_exists_via_show(mod_name)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/modules.py", line 563, in mod_exists_via_show
    stderr = self.show(mod_name)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/modules.py", line 711, in show
    ans = self.run_module('show', mod_name, check_output=False, return_stderr=True)
  File "/scrtp/avon/eb/software/EasyBuild/4.4.0/lib/python3.6/site-packages/easybuild/tools/modules.py", line 824, in run_module
    (stdout, stderr) = proc.communicate()
  File "/usr/lib64/python3.6/subprocess.py", line 863, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib64/python3.6/subprocess.py", line 1578, in _communicate
    self.stderr.errors)
  File "/usr/lib64/python3.6/subprocess.py", line 760, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 549: ordinal not in range(128)

which looks to me like it is failing at the same place. I assume it will build fine with LC_ALL=, I need the module now anyway so I'll test and report back.

dithwick commented 2 years ago

Tensorflow again (this time 2.5.0), I'll post the error output here in case it helps identify where the problem is cropping up:

== processing EasyBuild easyconfig /sulis/easybuild/software/EasyBuild/4.4.1/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.5.0-foss-2020b.eb
== building and installing MPI/GCC/10.2.0/OpenMPI/4.0.5/TensorFlow/2.5.0...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== patching...
== preparing...
== ... (took 6 secs)
== configuring...
== building...
== testing...
== installing...
== taking care of extensions...
== ... (took 5 secs)
ERROR: Traceback (most recent call last):
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/main.py", line 118, in build_and_install_software
    (ec_res['success'], app_log, err) = build_and_install_one(ec, init_env)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3691, in build_and_install_one
    result = app.run_all_steps(run_test_cases=run_test_cases)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3582, in run_all_steps
    self.run_step(step_name, step_methods)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3435, in run_step
    step_method(self)()
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/easyblocks/generic/pythonbundle.py", line 128, in extensions_step
    super(PythonBundle, self).extensions_step(*args, **kwargs)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 2379, in extensions_step
    fake_mod_data = self.load_fake_module(purge=True, extra_modules=build_dep_mods)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 1567, in load_fake_module
    fake_mod_path = self.make_module_step(fake=True)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 3205, in make_module_step
    txt += self.make_module_dep()
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/framework/easyblock.py", line 1187, in make_module_dep
    full_mod_subdir, all_deps)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/tools/modules.py", line 1121, in path_to_top_of_module_tree
    modpath_exts = dict([(k, v) for k, v in self.modpath_extensions_for(deps).items() if v])
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/tools/modules.py", line 1053, in modpath_extensions_for
    modtxt = self.read_module_file(mod_name)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/tools/modules.py", line 983, in read_module_file
    modfilepath = self.modulefile_path(mod_name)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/tools/modules.py", line 748, in modulefile_path
    modpath = self.get_value_from_modulefile(mod_name, modpath_re)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/tools/modules.py", line 726, in get_value_from_modulefile
    if self.exist([mod_name], skip_avail=True)[0]:
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/tools/modules.py", line 627, in exist
    mod_exists = mod_exists_via_show(mod_name)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/tools/modules.py", line 563, in mod_exists_via_show
    stderr = self.show(mod_name)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/tools/modules.py", line 711, in show
    ans = self.run_module('show', mod_name, check_output=False, return_stderr=True)
  File "/sulis/easybuild/software/EasyBuild/4.4.1/lib/python3.6/site-packages/easybuild/tools/modules.py", line 824, in run_module
    (stdout, stderr) = proc.communicate()
  File "/usr/lib64/python3.6/subprocess.py", line 863, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib64/python3.6/subprocess.py", line 1578, in _communicate
    self.stderr.errors)
  File "/usr/lib64/python3.6/subprocess.py", line 760, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 593: ordinal not in range(128)