google-deepmind / lab2d

A customisable 2D platform for agent-based AI research
Apache License 2.0
421 stars 53 forks source link

Add support for building M1 wheel files in Github Actions #25

Closed ashok-arora closed 1 year ago

ashok-arora commented 1 year ago

Hey, I noticed in workflows file that it builds the wheels for various x86_64 platforms.

GitHub has recently announced support for M1 macs in actions as well, so would it be possible to build M1 wheel files as well?

This will help in avoiding the hassle of building it from source.

tkoeppe commented 1 year ago

Yes, we could give that a try. Would you like to make a pull request?

ashok-arora commented 1 year ago

Sure! I'll start working on it.

tkoeppe commented 1 year ago

Could you start by making the precommit tests work on M1?

ashok-arora commented 1 year ago

@tkoeppe So for precommit tests to work, the following commands should run on M1 right?

sudo -H pip3 install numpy packaging
bazel --bazelrc=.bazelrc build --config=lua5_1 --config=libc++ //...
bazel --bazelrc=.bazelrc test  --config=lua5_1 --config=libc++ --test_output=errors //...
bazel --bazelrc=.bazelrc build --config=lua5_2 --config=libc++ //...
bazel --bazelrc=.bazelrc test  --config=lua5_2 --config=libc++ --test_output=errors //...
tkoeppe commented 1 year ago

Yes, that's right, the same as in the existing precommit script, basically.

ashok-arora commented 1 year ago

I don't have experience with clang and linkers, but I tried to fix a few issues, the first I received was:

clang: error: invalid linker name in argument '-fuse-ld=lld'

which I resolved through

brew install llvm
fish_add_path /opt/homebrew/opt/llvm/bin

but now I am getting this error and not sure what this one means:

ld64.lld: error: library not found for -lenv_c_api_example
clang: error: linker command failed with exit code 1 (use -v to see invocation)
tkoeppe commented 1 year ago

Hi! You don't need to use LLD if you don't have it, you can just drop the -fuse-ld=lld flag. Any linker should do the job I think.

tkoeppe commented 1 year ago

That said, what appears to be happening here is that the tests are somehow built in dynamic mode, which somehow goes wrong. I don't have much experience with Bazel's --dynamic_mode=fully on MacOS, but just for a quick check, could you run all bazel commands with --dynamic_mode=off? (You can stick that in the .bazelrc file if you like, or just modify the individual commands.)

ashok-arora commented 1 year ago

@tkoeppe So I dropped out the linker flag and added the --dynamic_mode=off option to .bazelrc but it still gave the same error.

↪ bazel --bazelrc=.bazelrc build --config=lua5_1 --config=libc++ //... --sandbox_debug                                                                               (base) 
INFO: Analyzed 176 targets (1 packages loaded, 364 targets configured).
INFO: Found 176 targets...
ERROR: /Users/ashok/Desktop/lab2d/third_party/rl_api/BUILD:52:8: Linking third_party/rl_api/env_c_api_example_test failed: (Exit 1): sandbox-exec failed: error executing command 
  (cd /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/sandbox/darwin-sandbox/1408/execroot/org_deepmind_lab2d && \
  exec env - \
    PATH='/Users/ashok/miniconda/bin:/Users/ashok/miniconda/condabin:/opt/homebrew/opt/llvm/bin:/opt/homebrew/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Applications/VMware Fusion Tech Preview.app/Contents/Public:/Library/Frameworks/Mono.framework/Versions/Current/Commands' \
    PWD=/proc/self/cwd \
    TMPDIR=/var/folders/jz/02f7nm712956gy2krdl0wlzm0000gn/T/ \
  /usr/bin/sandbox-exec -f /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/sandbox/darwin-sandbox/1408/sandbox.sb /var/tmp/_bazel_ashok/install/492267a38d620c777a8b281ba7e366a6/process-wrapper '--timeout=0' '--kill_delay=15' '--stats=/private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/sandbox/darwin-sandbox/1408/stats.out' external/local_config_cc/cc_wrapper.sh @bazel-out/darwin_arm64-fastbuild/bin/third_party/rl_api/env_c_api_example_test-2.params)
ld64.lld: error: library not found for -lenv_c_api_example
clang: error: linker command failed with exit code 1 (use -v to see invocation)
INFO: Elapsed time: 44.707s, Critical Path: 13.12s
INFO: 355 processes: 60 internal, 295 darwin-sandbox.
FAILED: Build did NOT complete successfully
tkoeppe commented 1 year ago

Oh, wait, I see: this is a special test that explicitly as a precompiled DSO in its srcs.

Does everything else work? This one test isn't so important.

I think what might be happening here is that somehow bazel turns the dependency on libenv_c_api_example.so into a flag -lenv_c_api_example (instead of something like -l:libenv_c_api_example.so, which anyway may be a GNU-specific syntax), but on MacOS, -lfoo means "search for libfoo.dylib". Two very quick things you could try are to rename the target libenv_c_api_example.so into something else, e.g. a) env_c_api_example.so, or b) libenv_c_api_example.dylib (and then update the srcs in https://github.com/deepmind/lab2d/blob/94e37d189aee8d309bc1dfae7227676c77636e88/third_party/rl_api/BUILD#L58 accordingly).

ashok-arora commented 1 year ago

Hi @tkoeppe, thank you very much for the help, renaming it to libenv_c_api_example.dylib led to a successful built! 🎉 Although, two of the tests from lua5_1 failed due to a timeout error. Here are the logs:

lua5_1 build logs ```bash ↪ bazel --bazelrc=.bazelrc build --config=lua5_1 --config=libc++ //... --sandbox_debug (base) Starting local Bazel server and connecting to it... INFO: Analyzed 176 targets (114 packages loaded, 2438 targets configured). INFO: Found 176 targets... INFO: Deleting stale sandbox base /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/sandbox INFO: Elapsed time: 16.765s, Critical Path: 1.66s INFO: 13 processes: 8 internal, 5 darwin-sandbox. INFO: Build completed successfully, 13 total actions ```
lua5_2 build logs ```bash ↪ bazel --bazelrc=.bazelrc build --config=lua5_2 --config=libc++ //... --sandbox_debug (base) INFO: Build option --//dmlab2d/lib/lua:lua_version has changed, discarding analysis cache. INFO: Analyzed 176 targets (1 packages loaded, 2442 targets configured). INFO: Found 176 targets... INFO: From Compiling loslib.c: external/lua5_2_archive/loslib.c:108:3: warning: 'tmpnam' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of tmpnam(3), it is highly recommended that you use mkstemp(3) instead. [-Wdeprecated-declarations] lua_tmpnam(buff, err); ^ external/lua5_2_archive/loslib.c:57:33: note: expanded from macro 'lua_tmpnam' #define lua_tmpnam(b,e) { e = (tmpnam(b) == NULL); } ^ /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h:197:1: note: 'tmpnam' has been explicitly marked deprecated here __deprecated_msg("This function is provided for compatibility reasons only. Due to security concerns inherent in the design of tmpnam(3), it is highly recommended that you use mkstemp(3) instead.") ^ /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/cdefs.h:215:48: note: expanded from macro '__deprecated_msg' #define __deprecated_msg(_msg) __attribute__((__deprecated__(_msg))) ^ 1 warning generated. INFO: Elapsed time: 35.242s, Critical Path: 12.92s INFO: 373 processes: 183 internal, 190 darwin-sandbox. INFO: Build completed successfully, 373 total actions ```
lua5_1 test logs ```bash ↪ bazel --bazelrc=.bazelrc test --config=lua5_1 --config=libc++ --test_output=errors //... --sandbox_debug (base) INFO: Analyzed 176 targets (0 packages loaded, 0 targets configured). INFO: Found 100 targets and 76 test targets... TIMEOUT: //dmlab2d/lib/system/generators/pushbox/lua:pushbox_test (Summary) /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/execroot/org_deepmind_lab2d/bazel-out/darwin_arm64-fastbuild/testlogs/dmlab2d/lib/system/generators/pushbox/lua/pushbox_test/test.log INFO: From Testing //dmlab2d/lib/system/generators/pushbox/lua:pushbox_test: ==================== Test output for //dmlab2d/lib/system/generators/pushbox/lua:pushbox_test: -- Test timed out at 2023-05-02 04:11:10 UTC -- ================================================================================ TIMEOUT: //dmlab2d/lib/system/image:image_test (Summary) /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/execroot/org_deepmind_lab2d/bazel-out/darwin_arm64-fastbuild/testlogs/dmlab2d/lib/system/image/image_test/test.log INFO: From Testing //dmlab2d/lib/system/image:image_test: ==================== Test output for //dmlab2d/lib/system/image:image_test: -- Test timed out at 2023-05-02 04:11:18 UTC -- ================================================================================ INFO: Elapsed time: 102.949s, Critical Path: 62.51s INFO: 77 processes: 79 darwin-sandbox. INFO: Build completed, 2 tests FAILED, 77 total actions //dmlab2d:dmlab2d_test PASSED in 7.5s //dmlab2d:settings_helper_test PASSED in 2.4s //dmlab2d/lib:dmlab2d_test PASSED in 3.3s //dmlab2d/lib:game_scripts/common/args_test PASSED in 0.5s //dmlab2d/lib:game_scripts/common/class_test PASSED in 3.1s //dmlab2d/lib:game_scripts/common/image_helpers_test PASSED in 7.7s //dmlab2d/lib:game_scripts/common/log_test PASSED in 7.8s //dmlab2d/lib:game_scripts/common/paths_test PASSED in 0.4s //dmlab2d/lib:game_scripts/common/properties_test PASSED in 0.5s //dmlab2d/lib:game_scripts/common/read_settings_test PASSED in 4.1s //dmlab2d/lib:game_scripts/common/set_test PASSED in 5.6s //dmlab2d/lib:game_scripts/common/strings_test PASSED in 0.4s //dmlab2d/lib:game_scripts/common/tables_test PASSED in 5.6s //dmlab2d/lib:game_scripts/levels/examples/level_api PASSED in 0.5s //dmlab2d/lib:game_scripts/levels/examples/level_api_discrete_only PASSED in 0.4s //dmlab2d/lib:game_scripts/testing/asserts_test PASSED in 3.4s //dmlab2d/lib:game_scripts/testing/mocking_test PASSED in 1.8s //dmlab2d/lib/env_lua_api:actions_test PASSED in 3.3s //dmlab2d/lib/env_lua_api:env_lua_api_test PASSED in 8.8s //dmlab2d/lib/env_lua_api:episode_test PASSED in 1.0s //dmlab2d/lib/env_lua_api:events_test PASSED in 6.7s //dmlab2d/lib/env_lua_api:observations_test PASSED in 0.3s //dmlab2d/lib/env_lua_api:properties_test PASSED in 3.2s //dmlab2d/lib/game_scripts/levels/chase_eat:chase_eat_test PASSED in 5.6s //dmlab2d/lib/game_scripts/levels/clean_up:clean_up_test PASSED in 3.2s //dmlab2d/lib/game_scripts/levels/commons_harvest:commons_harvest_test PASSED in 9.6s //dmlab2d/lib/game_scripts/levels/pushbox:pushbox_test PASSED in 3.7s //dmlab2d/lib/game_scripts/levels/running_with_scissors:running_with_scissors_test PASSED in 0.5s //dmlab2d/lib/lua:bind_test PASSED in 59.1s //dmlab2d/lib/lua:call_test PASSED in 59.1s //dmlab2d/lib/lua:class_test PASSED in 1.4s //dmlab2d/lib/lua:push_script_test PASSED in 59.1s //dmlab2d/lib/lua:push_test PASSED in 0.3s //dmlab2d/lib/lua:read_test PASSED in 2.7s //dmlab2d/lib/lua:ref_test PASSED in 3.3s //dmlab2d/lib/lua:stack_resetter_test PASSED in 59.1s //dmlab2d/lib/lua:table_ref_test PASSED in 59.1s //dmlab2d/lib/lua:vm_test PASSED in 4.1s //dmlab2d/lib/system/file_system/lua:file_system_test PASSED in 3.7s //dmlab2d/lib/system/file_system/lua:lua_file_system_test PASSED in 8.9s //dmlab2d/lib/system/generators/pushbox/python:pushbox_test PASSED in 12.2s //dmlab2d/lib/system/grid_world:grid_shape_test PASSED in 2.9s //dmlab2d/lib/system/grid_world:grid_test PASSED in 3.3s //dmlab2d/lib/system/grid_world:grid_view_test PASSED in 3.2s //dmlab2d/lib/system/grid_world:text_tools_test PASSED in 6.8s //dmlab2d/lib/system/grid_world:world_test PASSED in 1.9s //dmlab2d/lib/system/grid_world/collections:fixed_handle_map_test PASSED in 1.1s //dmlab2d/lib/system/grid_world/collections:handle_names_test PASSED in 0.2s //dmlab2d/lib/system/grid_world/collections:handle_test PASSED in 0.2s //dmlab2d/lib/system/grid_world/collections:object_pool_test PASSED in 3.4s //dmlab2d/lib/system/grid_world/collections:shuffled_membership_test PASSED in 0.2s //dmlab2d/lib/system/grid_world/collections:shuffled_set_test PASSED in 2.9s //dmlab2d/lib/system/grid_world/lua:grid_lua_test PASSED in 3.2s //dmlab2d/lib/system/grid_world/lua:lua_grid_view_test PASSED in 3.4s //dmlab2d/lib/system/grid_world/lua:lua_world_test PASSED in 5.3s //dmlab2d/lib/system/grid_world/lua:world_lua_test PASSED in 3.2s //dmlab2d/lib/system/image:lua_image_test PASSED in 0.5s //dmlab2d/lib/system/math:math2d_algorithms_test PASSED in 3.8s //dmlab2d/lib/system/math:math2d_test PASSED in 3.1s //dmlab2d/lib/system/math/lua:math2d_test PASSED in 0.3s //dmlab2d/lib/system/random/lua:random_test PASSED in 2.7s //dmlab2d/lib/system/tensor:tensor_view_test PASSED in 3.0s //dmlab2d/lib/system/tensor/lua:tensor_benchmark PASSED in 15.1s //dmlab2d/lib/system/tensor/lua:tensor_test PASSED in 0.4s //dmlab2d/lib/system/tile:pixel_test PASSED in 1.1s //dmlab2d/lib/system/tile:tile_renderer_test PASSED in 9.3s //dmlab2d/lib/system/tile:tile_set_test PASSED in 9.4s //dmlab2d/lib/system/tile/lua:tile_scene_test PASSED in 3.5s //dmlab2d/lib/system/tile/lua:tile_set_test PASSED in 3.3s //dmlab2d/lib/system/tile/lua:tile_test PASSED in 4.8s //dmlab2d/lib/util:file_reader_test PASSED in 51.5s //dmlab2d/lib/util:files_test PASSED in 0.3s //dmlab2d/lib/util:visit_set_difference_and_intersection_test PASSED in 2.7s //third_party/rl_api:env_c_api_example_test PASSED in 7.8s //dmlab2d/lib/system/generators/pushbox/lua:pushbox_test TIMEOUT in 60.0s /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/execroot/org_deepmind_lab2d/bazel-out/darwin_arm64-fastbuild/testlogs/dmlab2d/lib/system/generators/pushbox/lua/pushbox_test/test.log //dmlab2d/lib/system/image:image_test TIMEOUT in 60.0s /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/execroot/org_deepmind_lab2d/bazel-out/darwin_arm64-fastbuild/testlogs/dmlab2d/lib/system/image/image_test/test.log Executed 76 out of 76 tests: 74 tests pass and 2 fail locally. There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are. ```
lua5_2 test logs ```bash ↪ bazel --bazelrc=.bazelrc test --config=lua5_2 --config=libc++ --test_output=errors //... --sandbox_debug (base) INFO: Analyzed 176 targets (0 packages loaded, 0 targets configured). INFO: Found 100 targets and 76 test targets... INFO: Elapsed time: 42.606s, Critical Path: 18.70s INFO: 54 processes: 54 darwin-sandbox. INFO: Build completed successfully, 54 total actions //dmlab2d:settings_helper_test (cached) PASSED in 2.4s //dmlab2d/lib/system/generators/pushbox/python:pushbox_test (cached) PASSED in 12.2s //dmlab2d/lib/system/grid_world:grid_shape_test (cached) PASSED in 2.9s //dmlab2d/lib/system/grid_world:grid_test (cached) PASSED in 3.3s //dmlab2d/lib/system/grid_world:grid_view_test (cached) PASSED in 3.2s //dmlab2d/lib/system/grid_world:text_tools_test (cached) PASSED in 6.8s //dmlab2d/lib/system/grid_world:world_test (cached) PASSED in 1.9s //dmlab2d/lib/system/grid_world/collections:fixed_handle_map_test (cached) PASSED in 1.1s //dmlab2d/lib/system/grid_world/collections:handle_names_test (cached) PASSED in 0.2s //dmlab2d/lib/system/grid_world/collections:handle_test (cached) PASSED in 0.2s //dmlab2d/lib/system/grid_world/collections:object_pool_test (cached) PASSED in 3.4s //dmlab2d/lib/system/grid_world/collections:shuffled_membership_test (cached) PASSED in 0.2s //dmlab2d/lib/system/grid_world/collections:shuffled_set_test (cached) PASSED in 2.9s //dmlab2d/lib/system/math:math2d_algorithms_test (cached) PASSED in 3.8s //dmlab2d/lib/system/math:math2d_test (cached) PASSED in 3.1s //dmlab2d/lib/system/tensor:tensor_view_test (cached) PASSED in 3.0s //dmlab2d/lib/system/tile:pixel_test (cached) PASSED in 1.1s //dmlab2d/lib/system/tile:tile_renderer_test (cached) PASSED in 9.3s //dmlab2d/lib/system/tile:tile_set_test (cached) PASSED in 9.4s //dmlab2d/lib/util:file_reader_test (cached) PASSED in 51.5s //dmlab2d/lib/util:files_test (cached) PASSED in 0.3s //dmlab2d/lib/util:visit_set_difference_and_intersection_test (cached) PASSED in 2.7s //third_party/rl_api:env_c_api_example_test (cached) PASSED in 7.8s //dmlab2d:dmlab2d_test PASSED in 2.2s //dmlab2d/lib:dmlab2d_test PASSED in 3.3s //dmlab2d/lib:game_scripts/common/args_test PASSED in 1.9s //dmlab2d/lib:game_scripts/common/class_test PASSED in 3.0s //dmlab2d/lib:game_scripts/common/image_helpers_test PASSED in 1.1s //dmlab2d/lib:game_scripts/common/log_test PASSED in 3.4s //dmlab2d/lib:game_scripts/common/paths_test PASSED in 1.1s //dmlab2d/lib:game_scripts/common/properties_test PASSED in 3.2s //dmlab2d/lib:game_scripts/common/read_settings_test PASSED in 3.4s //dmlab2d/lib:game_scripts/common/set_test PASSED in 3.2s //dmlab2d/lib:game_scripts/common/strings_test PASSED in 6.1s //dmlab2d/lib:game_scripts/common/tables_test PASSED in 3.1s //dmlab2d/lib:game_scripts/levels/examples/level_api PASSED in 7.4s //dmlab2d/lib:game_scripts/levels/examples/level_api_discrete_only PASSED in 2.6s //dmlab2d/lib:game_scripts/testing/asserts_test PASSED in 2.6s //dmlab2d/lib:game_scripts/testing/mocking_test PASSED in 3.3s //dmlab2d/lib/env_lua_api:actions_test PASSED in 1.7s //dmlab2d/lib/env_lua_api:env_lua_api_test PASSED in 1.3s //dmlab2d/lib/env_lua_api:episode_test PASSED in 2.5s //dmlab2d/lib/env_lua_api:events_test PASSED in 11.1s //dmlab2d/lib/env_lua_api:observations_test PASSED in 11.1s //dmlab2d/lib/env_lua_api:properties_test PASSED in 9.4s //dmlab2d/lib/game_scripts/levels/chase_eat:chase_eat_test PASSED in 18.5s //dmlab2d/lib/game_scripts/levels/clean_up:clean_up_test PASSED in 18.6s //dmlab2d/lib/game_scripts/levels/commons_harvest:commons_harvest_test PASSED in 3.2s //dmlab2d/lib/game_scripts/levels/pushbox:pushbox_test PASSED in 7.0s //dmlab2d/lib/game_scripts/levels/running_with_scissors:running_with_scissors_test PASSED in 3.2s //dmlab2d/lib/lua:bind_test PASSED in 2.1s //dmlab2d/lib/lua:call_test PASSED in 1.9s //dmlab2d/lib/lua:class_test PASSED in 2.4s //dmlab2d/lib/lua:push_script_test PASSED in 3.1s //dmlab2d/lib/lua:push_test PASSED in 3.0s //dmlab2d/lib/lua:read_test PASSED in 2.4s //dmlab2d/lib/lua:ref_test PASSED in 3.2s //dmlab2d/lib/lua:stack_resetter_test PASSED in 2.4s //dmlab2d/lib/lua:table_ref_test PASSED in 3.2s //dmlab2d/lib/lua:vm_test PASSED in 2.8s //dmlab2d/lib/system/file_system/lua:file_system_test PASSED in 1.7s //dmlab2d/lib/system/file_system/lua:lua_file_system_test PASSED in 3.2s //dmlab2d/lib/system/generators/pushbox/lua:pushbox_test PASSED in 18.7s //dmlab2d/lib/system/grid_world/lua:grid_lua_test PASSED in 1.5s //dmlab2d/lib/system/grid_world/lua:lua_grid_view_test PASSED in 0.7s //dmlab2d/lib/system/grid_world/lua:lua_world_test PASSED in 2.2s //dmlab2d/lib/system/grid_world/lua:world_lua_test PASSED in 5.7s //dmlab2d/lib/system/image:image_test PASSED in 3.0s //dmlab2d/lib/system/image:lua_image_test PASSED in 11.2s //dmlab2d/lib/system/math/lua:math2d_test PASSED in 9.4s //dmlab2d/lib/system/random/lua:random_test PASSED in 3.3s //dmlab2d/lib/system/tensor/lua:tensor_benchmark PASSED in 17.9s //dmlab2d/lib/system/tensor/lua:tensor_test PASSED in 3.1s //dmlab2d/lib/system/tile/lua:tile_scene_test PASSED in 3.2s //dmlab2d/lib/system/tile/lua:tile_set_test PASSED in 3.0s //dmlab2d/lib/system/tile/lua:tile_test PASSED in 2.9s Executed 53 out of 76 tests: 76 tests pass. There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are. ```
tkoeppe commented 1 year ago

Aha, great! Can you also try renaming it to something that doesn't start with "lib"? I wonder if we can find some solution that works on both platforms. I also wonder how the current x86-MacOS tests are passing, which don't seem to run into this problem.

For the tests, can you please run them with --logtostderr, e.g. add blaze test [...] --test_arg=--logtostderr? And increase the timeout, --test_timeout=3000? Those tests shouldn't really be taking long; I suspect that something is actually broken inside, but without more output it's impossible to tell.

ashok-arora commented 1 year ago

Hi @tkoeppe, I tried to name it without the preceding lib but it keeps failing. I ran into a different error this time with running the tests with the suggested args, here's the output:

lua5_1 test logs ```bash ↪ bazel --bazelrc=.bazelrc test --config=lua5_1 --config=libc++ --test_output=errors --test_arg=--logtostderr --test_timeout=3000 //... --sandbox_debug (base) INFO: Analyzed 176 targets (0 packages loaded, 34 targets configured). INFO: Found 100 targets and 76 test targets... FAIL: //dmlab2d/lib/system/tensor/lua:tensor_benchmark (see /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/execroot/org_deepmind_lab2d/bazel-out/darwin_arm64-fastbuild/testlogs/dmlab2d/lib/system/tensor/lua/tensor_benchmark/test.log) INFO: From Testing //dmlab2d/lib/system/tensor/lua:tensor_benchmark: ==================== Test output for //dmlab2d/lib/system/tensor/lua:tensor_benchmark: /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/sandbox/darwin-sandbox/252/execroot/org_deepmind_lab2d/bazel-out/darwin_arm64-fastbuild/bin/dmlab2d/lib/system/tensor/lua/tensor_benchmark.runfiles/org_deepmind_lab2d/dmlab2d/lib/system/tensor/lua/tensor_benchmark: error: unrecognized command-line flag: --logtostderr ================================================================================ INFO: Elapsed time: 24.533s, Critical Path: 13.28s INFO: 77 processes: 77 darwin-sandbox. INFO: Build completed, 1 test FAILED, 77 total actions //dmlab2d:dmlab2d_test PASSED in 2.4s //dmlab2d:settings_helper_test PASSED in 0.2s //dmlab2d/lib:dmlab2d_test PASSED in 2.5s //dmlab2d/lib:game_scripts/common/args_test PASSED in 2.3s //dmlab2d/lib:game_scripts/common/class_test PASSED in 2.3s //dmlab2d/lib:game_scripts/common/image_helpers_test PASSED in 2.5s //dmlab2d/lib:game_scripts/common/log_test PASSED in 2.2s //dmlab2d/lib:game_scripts/common/paths_test PASSED in 2.5s //dmlab2d/lib:game_scripts/common/properties_test PASSED in 2.3s //dmlab2d/lib:game_scripts/common/read_settings_test PASSED in 2.3s //dmlab2d/lib:game_scripts/common/set_test PASSED in 13.2s //dmlab2d/lib:game_scripts/common/strings_test PASSED in 13.2s //dmlab2d/lib:game_scripts/common/tables_test PASSED in 13.2s //dmlab2d/lib:game_scripts/levels/examples/level_api PASSED in 3.6s //dmlab2d/lib:game_scripts/levels/examples/level_api_discrete_only PASSED in 1.7s //dmlab2d/lib:game_scripts/testing/asserts_test PASSED in 1.8s //dmlab2d/lib:game_scripts/testing/mocking_test PASSED in 2.6s //dmlab2d/lib/env_lua_api:actions_test PASSED in 8.8s //dmlab2d/lib/env_lua_api:env_lua_api_test PASSED in 0.8s //dmlab2d/lib/env_lua_api:episode_test PASSED in 8.8s //dmlab2d/lib/env_lua_api:events_test PASSED in 8.8s //dmlab2d/lib/env_lua_api:observations_test PASSED in 2.2s //dmlab2d/lib/env_lua_api:properties_test PASSED in 8.8s //dmlab2d/lib/game_scripts/levels/chase_eat:chase_eat_test PASSED in 2.3s //dmlab2d/lib/game_scripts/levels/clean_up:clean_up_test PASSED in 2.2s //dmlab2d/lib/game_scripts/levels/commons_harvest:commons_harvest_test PASSED in 2.0s //dmlab2d/lib/game_scripts/levels/pushbox:pushbox_test PASSED in 2.1s //dmlab2d/lib/game_scripts/levels/running_with_scissors:running_with_scissors_test PASSED in 1.6s //dmlab2d/lib/lua:bind_test PASSED in 2.3s //dmlab2d/lib/lua:call_test PASSED in 2.2s //dmlab2d/lib/lua:class_test PASSED in 2.2s //dmlab2d/lib/lua:push_script_test PASSED in 0.7s //dmlab2d/lib/lua:push_test PASSED in 2.2s //dmlab2d/lib/lua:read_test PASSED in 1.6s //dmlab2d/lib/lua:ref_test PASSED in 2.1s //dmlab2d/lib/lua:stack_resetter_test PASSED in 2.4s //dmlab2d/lib/lua:table_ref_test PASSED in 1.7s //dmlab2d/lib/lua:vm_test PASSED in 2.4s //dmlab2d/lib/system/file_system/lua:file_system_test PASSED in 2.0s //dmlab2d/lib/system/file_system/lua:lua_file_system_test PASSED in 2.1s //dmlab2d/lib/system/generators/pushbox/lua:pushbox_test PASSED in 0.7s //dmlab2d/lib/system/generators/pushbox/python:pushbox_test PASSED in 0.3s //dmlab2d/lib/system/grid_world:grid_shape_test PASSED in 0.1s //dmlab2d/lib/system/grid_world:grid_test PASSED in 0.1s //dmlab2d/lib/system/grid_world:grid_view_test PASSED in 0.1s //dmlab2d/lib/system/grid_world:text_tools_test PASSED in 0.1s //dmlab2d/lib/system/grid_world:world_test PASSED in 0.1s //dmlab2d/lib/system/grid_world/collections:fixed_handle_map_test PASSED in 0.1s //dmlab2d/lib/system/grid_world/collections:handle_names_test PASSED in 0.1s //dmlab2d/lib/system/grid_world/collections:handle_test PASSED in 0.1s //dmlab2d/lib/system/grid_world/collections:object_pool_test PASSED in 0.2s //dmlab2d/lib/system/grid_world/collections:shuffled_membership_test PASSED in 0.1s //dmlab2d/lib/system/grid_world/collections:shuffled_set_test PASSED in 0.1s //dmlab2d/lib/system/grid_world/lua:grid_lua_test PASSED in 4.3s //dmlab2d/lib/system/grid_world/lua:lua_grid_view_test PASSED in 2.2s //dmlab2d/lib/system/grid_world/lua:lua_world_test PASSED in 4.1s //dmlab2d/lib/system/grid_world/lua:world_lua_test PASSED in 4.2s //dmlab2d/lib/system/image:image_test PASSED in 1.6s //dmlab2d/lib/system/image:lua_image_test PASSED in 1.3s //dmlab2d/lib/system/math:math2d_algorithms_test PASSED in 0.1s //dmlab2d/lib/system/math:math2d_test PASSED in 0.1s //dmlab2d/lib/system/math/lua:math2d_test PASSED in 2.1s //dmlab2d/lib/system/random/lua:random_test PASSED in 2.4s //dmlab2d/lib/system/tensor:tensor_view_test PASSED in 0.1s //dmlab2d/lib/system/tensor/lua:tensor_test PASSED in 1.0s //dmlab2d/lib/system/tile:pixel_test PASSED in 0.1s //dmlab2d/lib/system/tile:tile_renderer_test PASSED in 0.1s //dmlab2d/lib/system/tile:tile_set_test PASSED in 0.1s //dmlab2d/lib/system/tile/lua:tile_scene_test PASSED in 8.9s //dmlab2d/lib/system/tile/lua:tile_set_test PASSED in 2.1s //dmlab2d/lib/system/tile/lua:tile_test PASSED in 2.1s //dmlab2d/lib/util:file_reader_test PASSED in 0.1s //dmlab2d/lib/util:files_test PASSED in 0.1s //dmlab2d/lib/util:visit_set_difference_and_intersection_test PASSED in 0.1s //third_party/rl_api:env_c_api_example_test PASSED in 2.1s //dmlab2d/lib/system/tensor/lua:tensor_benchmark FAILED in 2.4s /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/execroot/org_deepmind_lab2d/bazel-out/darwin_arm64-fastbuild/testlogs/dmlab2d/lib/system/tensor/lua/tensor_benchmark/test.log Executed 76 out of 76 tests: 75 tests pass and 1 fails locally. There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are. ```
test.log for the failed test ```bash exec ${PAGER:-/usr/bin/less} "$0" || exit 1 Executing tests from //dmlab2d/lib/system/tensor/lua:tensor_benchmark ----------------------------------------------------------------------------- /private/var/tmp/_bazel_ashok/93028e41a3bbf195060aa75d4a59b30f/sandbox/darwin-sandbox/252/execroot/org_deepmind_lab2d/bazel-out/darwin_arm64-fastbuild/bin/dmlab2d/lib/system/tensor/lua/tensor_benchmark.runfiles/org_deepmind_lab2d/dmlab2d/lib/system/tensor/lua/tensor_benchmark: error: unrecognized command-line flag: --logtostderr ```
tkoeppe commented 1 year ago

Ah OK, never mind, that's apparently not using Abseil's logging framework in the tests.

There isn't any output in the logs. I expect this is some DSO loading failure. Can you run the test directly?

bazel --bazelrc=.bazelrc build --config=lua5_1 --config=libc++  //dmlab2d/lib/system/image:image_test

This tells you where the binary is, then run that directly. It should be something like:

bazel-bin/dmlab2d/lib/system/image_image_test

See if that prints anything useful at all. Then also check its return value ($?).

On Linux I would recommend running with LD_DEBUG=all to show loader issues, but I don't know if there is some corresponding debug tool for the MacOS loader I'm afraid.

ashok-arora commented 1 year ago

Running the binary directly returned:

↪ ./bazel-bin/dmlab2d/lib/system/image/image_test            
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from LuaUnitTest
[ RUN      ] LuaUnitTest.RunsTest
dmlab2d/lib/testing/lua_unit_test.cc:34: Failure
Expected: (test_script) != (nullptr), actual: NULL vs (nullptr)
Must pass test_script as first arg!

[  FAILED  ] LuaUnitTest.RunsTest (0 ms)
[----------] 1 test from LuaUnitTest (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] LuaUnitTest.RunsTest

 1 FAILED TEST

with a return value of 1.

ashok-arora commented 1 year ago

Oh wait, do I have to pass some config file for test suite to image_test binary?

tkoeppe commented 1 year ago

Oh, yes, sorry, give me a second. I can test this on the Linux version :-)

ashok-arora commented 1 year ago

Hey @tkoeppe, I managed to get more verbose output using

↪ LD_DEBUG=all bazel --bazelrc=.bazelrc test  --test_arg=--logtostderr --test_timeout=3000 --config=lua5_1 --config=libc++ --test_output=all --cache_test_results=no //... --sandbox_debug > info.log 2> errors.log

errors.log info.log

Please find attached the info.log and errors.log files.

tkoeppe commented 1 year ago

To run locally, indeed the dmlab2d_lua_test macro adds two command line arguments that allow the test to locate the script and library directories: https://github.com/deepmind/lab2d/blob/94e37d189aee8d309bc1dfae7227676c77636e88/dmlab2d/lib/system/image/BUILD#L49

To do this manually, you can do (after building):

cd bazel-bin/dmlab2d/lib/system/image/image_test.runfiles/org_deepmind_lab2d
TEST_SRCDIR=$(pwd) ./dmlab2d/lib/system/image/image_test image_test dmlab2d/lib/system/image

(bazel test takes care of setting the TEST_SRCDIR environment, and the dmlab2d_lua_test macro computes the two command line arguments. You can see the full invocation printed as part of the logs of the bazel test run.)

tkoeppe commented 1 year ago

The LD_DEBUG environment had no effect. It's a Linux-only thing; the environment variable is consumed by Linux's ELF interpreter. There's no such thing on MacOS, and I don't know how to debug dynamic library loading on MacOS I'm afraid.

ashok-arora commented 1 year ago

Yes @tkoeppe, I tried to look for macOS equivalent of LD_DEBUG but couldn't find any. Although, adding the --test_output=all to the command did give a verbose output, was that any helpful?

Also, since the code seems to be building on mac now, can we proceed with that and leave the tests out?

tkoeppe commented 1 year ago

Does the image_test pass now? It seems so from your log? What changed?

tkoeppe commented 1 year ago

Yes, go ahead and make a pull request that updates the precommit script to add a case that uses the M1 architecture.

I'm not sure what to do with the libenv_c_api_example.so rule; we could maybe make the name conditional on some selectable condition. But I'd like to find out why the x86-MacOS version doesn't have a problem with this.

ashok-arora commented 1 year ago

Does the image_test pass now? It seems so from your log? What changed?

Yes, it did pass, although I am not sure what changed. Earlier, It passed because of the --test_timeout=3000 flag but now it seems to be passing without it.

ashok-arora commented 1 year ago

Hey @tkoeppe, did you see the latest comment in the PR?

ashok-arora commented 1 year ago

Hey @charlesbeattie, @tkoeppe seems to be busy, could someone else look into it? Additionally, is there any further work that I can contribute to this repository?

tkoeppe commented 1 year ago

Hi - I've been trying to figure out how to reliably set an action runner so that we can predictably get x86 and ARM machines, but without success so far. I think we'll want something robust in this regard.

One thing that we could really use is getting LuaJit to work on MacOS -- if that's something you could work on, that'd be great!

ashok-arora commented 1 year ago

Hi @tkoeppe,

Hi - I've been trying to figure out how to reliably set an action runner so that we can predictably get x86 and ARM machines, but without success so far. I think we'll want something robust in this regard.

The ARM machines are in beta right so does the action runner provide them for lab2d?

One thing that we could really use is getting LuaJit to work on MacOS -- if that's something you could work on, that'd be great!

Sure, I'll get to working on it, there's a homebrew package for it - https://formulae.brew.sh/formula/luajit so what exactly do I have to work on?

tkoeppe commented 1 year ago

We already have luajit downloaded from source, as an external dependency, and we build it from source on Linux. The task would be to get it to build on MacOS, too. I think I actually have some draft changes for that somewhere, but I haven't managed to get it to work yet. E.g. there's some machine code generation for luajit which needs a case added for Mach-O binaries, I think.

tkoeppe commented 1 year ago

To get back to the .so/.dylib issue: can you say again which build configuration you are ultimately using (e.g. what's in your .bazelrc regarding -fuse-ld, and did you install LLD locally in some way)?

ashok-arora commented 1 year ago

We already have luajit downloaded from source, as an external dependency, and we build it from source on Linux. The task would be to get it to build on MacOS, too. I think I actually have some draft changes for that somewhere, but I haven't managed to get it to work yet. E.g. there's some machine code generation for luajit which needs a case added for Mach-O binaries, I think.

Is there any advantage of building from source as compared to downloading the package from homebrew?

ashok-arora commented 1 year ago

To get back to the .so/.dylib issue: can you say again which build configuration you are ultimately using (e.g. what's in your .bazelrc regarding -fuse-ld, and did you install LLD locally in some way)?

I had installed llvm through brew and then the error dissapeared, and I didn't make any changes to the .bazelrc.

tkoeppe commented 1 year ago

To get back to the .so/.dylib issue: can you say again which build configuration you are ultimately using (e.g. what's in your .bazelrc regarding -fuse-ld, and did you install LLD locally in some way)?

I had installed llvm through brew and then the error dissapeared, and I didn't make any changes to the .bazelrc.

I believe the problem is that LLD doesn't work with the .so filename, whereas LD64 works fine. The configuration has been somewhat broken so far in that we never actually used LLD on MacOS. I wonder how you ever got to get the error clang: error: invalid linker name in argument '-fuse-ld=lld'; I don't see how the original configuration would have ever set this flag. I think what was originally in the .bazelrc file was actually entirely ineffective. Hmmm.

ashok-arora commented 1 year ago

We already have luajit downloaded from source, as an external dependency, and we build it from source on Linux. The task would be to get it to build on MacOS, too. I think I actually have some draft changes for that somewhere, but I haven't managed to get it to work yet. E.g. there's some machine code generation for luajit which needs a case added for Mach-O binaries, I think.

Is there any advantage of building from source as compared to downloading the package from homebrew?

and what about this @tkoeppe?

To get back to the .so/.dylib issue: can you say again which build configuration you are ultimately using (e.g. what's in your .bazelrc regarding -fuse-ld, and did you install LLD locally in some way)?

I had installed llvm through brew and then the error dissapeared, and I didn't make any changes to the .bazelrc.

I believe the problem is that LLD doesn't work with the .so filename, whereas LD64 works fine. The configuration has been somewhat broken so far in that we never actually used LLD on MacOS. I wonder how you ever got to get the error clang: error: invalid linker name in argument '-fuse-ld=lld'; I don't see how the original configuration would have ever set this flag. I think what was originally in the .bazelrc file was actually entirely ineffective. Hmmm.

So what should I replace it with?

My end goal is to make it easier to install melting-pot on m1 mac by installing them through pypi and I want to package them all for pypi.

tkoeppe commented 1 year ago

We already have luajit downloaded from source, as an external dependency, and we build it from source on Linux. The task would be to get it to build on MacOS, too. I think I actually have some draft changes for that somewhere, but I haven't managed to get it to work yet. E.g. there's some machine code generation for luajit which needs a case added for Mach-O binaries, I think.

Is there any advantage of building from source as compared to downloading the package from homebrew?

and what about this @tkoeppe?

I didn't have a good answer there, at least I don't know if you find it convincing. Bazel is designed around hermetic, self-contained builds, and that's how we made Lab2D internally, so it was just the natural way to do it. And it works well on Linux. So it's mainly that I don't really want to add a new way of doing things if we can just keep doing it that way. Minor benefits are more compact deployment and better opportunities for compiler and link-time optimisations, though that's probably not terribly persuasive.

To get back to the .so/.dylib issue: can you say again which build configuration you are ultimately using (e.g. what's in your .bazelrc regarding -fuse-ld, and did you install LLD locally in some way)?

I had installed llvm through brew and then the error dissapeared, and I didn't make any changes to the .bazelrc.

I believe the problem is that LLD doesn't work with the .so filename, whereas LD64 works fine. The configuration has been somewhat broken so far in that we never actually used LLD on MacOS. I wonder how you ever got to get the error clang: error: invalid linker name in argument '-fuse-ld=lld'; I don't see how the original configuration would have ever set this flag. I think what was originally in the .bazelrc file was actually entirely ineffective. Hmmm.

So what should I replace it with?

Just don't set any -fuse-ld flag, and the default behaviour should be that the native MacOS linker LD64 is used. That one can handle the DSO being named ".so".

tkoeppe commented 1 year ago

Hm, it could also be that you have to make sure to not have installed LLVM locally, since that might be found in preference to the system one. Or if that is indeed your setup, you could still make bazel use the system's clang binary by ponting the CC and CXX environment variables at it (e.g. see https://github.com/deepmind/lab2d/blob/34b4e923ae51dfdcf57782a39775c8debc364920/.github/workflows/precommit.yml#L35-L36).

In any case it should work fine on GitHub, so do revert the "so"-to-"dylib" change in your pull request in any case.

ashok-arora commented 1 year ago

I didn't have a good answer there, at least I don't know if you find it convincing. Bazel is designed around hermetic, self-contained builds, and that's how we made Lab2D internally, so it was just the natural way to do it. And it works well on Linux. So it's mainly that I don't really want to add a new way of doing things if we can just keep doing it that way. Minor benefits are more compact deployment and better opportunities for compiler and link-time optimisations, though that's probably not terribly persuasive.

Hey @tkoeppe, I understand. I will try my best to make it work on macOS too. I looked around and found this luajit.rb in the homebrew repo. Would this be helpful in porting it to a bazel build?

Or if that is indeed your setup, you could still make bazel use the system's clang binary by ponting the CC and CXX environment variables at it

Oh right, thats a good idea. I'll add it to the file.

tkoeppe commented 1 year ago

Please see https://github.com/deepmind/lab2d/pull/26#issuecomment-1557135662: I can't find any evidence that the GitHub runners use M1 hardware.

tkoeppe commented 1 year ago

Hey @tkoeppe, I understand. I will try my best to make it work on macOS too. I looked around and found this luajit.rb in the homebrew repo. Would this be helpful in porting it to a bazel build?

We already have the dependency on LuaJit in Bazel. We just need to make it compile on MacOS. Check out the Linux version, that one already uses it.

I can check in my existing draft changes to get closer to making LuaJit buildable if that helps.

tkoeppe commented 1 year ago

(Looks like we just fixed the ".so"/".dylib" issue by patching LLD: https://reviews.llvm.org/D151147)

ashok-arora commented 1 year ago

Hey @tkoeppe, sorry I've been busy in travelling this week and did not have access to my mac machine to test out the changes.

I'll get to it in a couple of days.

tkoeppe commented 1 year ago

I have ARM builds working in https://github.com/deepmind/lab2d/actions/runs/5093675265/jobs/9156577098 via cross-compilation: the runner is still using an x86 mac, but I told clang to compile for ARM.

For LuaJit, I also sort of got that working, but a major problem is that we've pinned it to the last released version (2.1.0v3-beta) from 2017, and it desparately needs an update to work (well) with new Macs.

ashok-arora commented 1 year ago

(Looks like we just fixed the ".so"/".dylib" issue by patching LLD: https://reviews.llvm.org/D151147)

Oh wow, thank you so much for reporting it upstream to bazel, your report here was so detailed and well written.

ashok-arora commented 1 year ago

I have ARM builds working in https://github.com/deepmind/lab2d/actions/runs/5093675265/jobs/9156577098 via cross-compilation: the runner is still using an x86 mac, but I told clang to compile for ARM.

Yeah, good idea @tkoeppe, cross compilation seems a good option, is there any drawback to it? In the mean time, I have filed a support ticket with GitHub support regarding actions for m1 ARM machines.

For LuaJit, I also sort of got that working, but a major problem is that we've pinned it to the last released version (2.1.0v3-beta) from 2017, and it desparately needs an update to work (well) with new Macs.

I'm back on my machine, should I try to build LuaJit on mine too?

tkoeppe commented 1 year ago

I set up a cross-compling wheel build. Have a look at the example output, and maybe give it a try and see if it actually works? https://github.com/deepmind/lab2d/releases/tag/test_arm64

I'll file a new issue about LuaJit.

ashok-arora commented 1 year ago

I downloaded the dmlab2d-1.0-cp310-cp310-macosx_13_0_arm64.whl file and ran this piece to code:

import dmlab2d
import dmlab2d.runfiles_helper

lab = dmlab2d.Lab2d(dmlab2d.runfiles_helper.find(), {"levelName": "chase_eat"})
env = dmlab2d.Environment(lab, ["WORLD.RGB"])
env.step({})

It ran successfully but there was no output, is there supposed to be some output?

ashok-arora commented 1 year ago

Hey @tkoeppe can you please look into the above code? Also, if cross-compiling is working and building the .whl files, can we also publish them on PyPI?

tkoeppe commented 1 year ago

I think there's a full example random agent shown in the documentation somewhere, can you try running that?

ashok-arora commented 1 year ago

I think there's a full example random agent shown in the documentation somewhere, can you try running that?

Oh okay, so I ran the dmlab2d/random_agent.py and here's the output:

https://github.com/deepmind/lab2d/assets/19496434/4296d7ef-80c9-468b-8111-9d2fc250e76f

Is this the expected output or is there a bug?

tkoeppe commented 1 year ago

I think if it behaves roughly the same as on Linux, then it should be fine!

It's good to know that the cross-compilation is working. I'll try to also get LuaJit to work on MacOS, which is something I've wanted to do for some time, and then we can hopefully enable both LuaJit and the ARM64 cross-compilation mode and make a new release.