Kyzarok / MScProject_AURORA_with_RNN

Extending Autonomous Skill Discovery with Recurrent Neural Networks
2 stars 0 forks source link

OriginalCode headers #1

Closed bossdm closed 3 years ago

bossdm commented 3 years ago

I want to work with the c++ version but I can't seem to access the headers from gitlab for the airl_env repository on the tensorflow branch. I cannot login as it seems to be access-restricted.

Would it be possible to put the required headers directly onto github? Based on snippets in the current headers in the repository, I require the following headers (at least) for being able to run the original code:

include <sferes/fit/fit_qd.hpp>

include <sferes/qd/container/archive.hpp>

include <sferes/qd/container/kdtree_storage.hpp>

include <sferes/qd/quality_diversity.hpp>

include <sferes/qd/selector/value_selector.hpp>

include <sferes/qd/selector/score_proportionate.hpp>

Kyzarok commented 3 years ago

When I contacted Dr Cully about this precise issue he said that the README is outdated, I will show the relevant email he sent back to me here:

"Well, the Readme is probably outdated, we are not using docker anymore, but singularity instead. Singularity is the equivalent of docker, but designed for academic usage, so it is compliant with HPCs and avoid some security issues. https://sylabs.io/guides/3.5/user-guide/

You don’t need access to our gitlab server to run the singularity container. Simply, install singularity and run the “start_container.sh” script in the singularity folder. This will compile a “sandbox” container and give you access to a shell inside the container. From there, you just have to do: cd /git/sferes2/ and then ./setup.sh. The setup.sh will do the ./waf configure and ./waf to compile the experiment, so you don’t have to bother with this. This compilation will an executable in the build folder. "

Hope this helps, I will put the headers you specified onto github if this does not work.

bossdm commented 3 years ago

Hi, for singularity version 3.7.3+94-g32d599245 I am having some trouble with the start_container.sh, getting the following error:

Visualisation available after activating the visu_server.sh script at the http://localhost:6080/ example_tf_sferes.simg does not exist, building it now from singularity.def FATAL: could not use fakeroot: no mapping entry found in /etc/subuid for root FATAL: could not open image /home/david/MScProject_AURORA_with_RNN/OriginalCode/example_tf_sferes-master/singularity/example_tf_sferes.simg: failed to retrieve path for /home/david/MScProject_AURORA_with_RNN/OriginalCode/example_tf_sferes-master/singularity/example_tf_sferes.simg: lstat /home/david/MScProject_AURORA_with_RNN/OriginalCode/example_tf_sferes-master/singularity/example_tf_sferes.simg: no such file or directory

This only happens when using the --fakeroot options as in the file.

If I remove these options then the container works but I cannot perform the setup as I don't have permission:

Visualisation available after activating the visu_server.sh script at the http://localhost:6080/ example_tf_sferes.simg does not exist, building it now from singularity.def INFO: Starting build... INFO: Using cached image INFO: Verifying bootstrap image /root/.singularity/cache/library/sha256.79de5f0f23218221fb984497b7fc35e5ecc6b806f378bbe67edea6c25fc59342 WARNING: integrity: signature not found for object group 1 WARNING: Bootstrap image could not be verified, but build will continue. INFO: Copying ./resources/setup.sh to /home/david/MScProject_AURORA_with_RNN/OriginalCode/example_tf_sferes-master/singularity/build-temp-954496519/rootfs/git/sferes2 INFO: Running post scriptlet

  • export LD_LIBRARY_PATH=/workspace/lib:/workspace/lib:/workspace/lib:/workspace/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
  • exit 0 INFO: Adding help info INFO: Adding labels INFO: Adding runscript INFO: Creating sandbox directory... INFO: Build complete: example_tf_sferes.simg WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container Singularity> ls Vagrantfile Vagrantfile.bak build_final_image.sh example_tf_sferes.simg resources singularity.def start_container.sh Singularity> ls example_tf_sferes.simg/ bazel bin boot dev environment etc git home lib lib64 media mnt opt proc root run sbin singularity srv sys tmp tmp_visu usr var workspace Singularity> cd /git/sferes2/ Singularity> ls COPYING COPYING.fr Dockerfile README.md build buildoptions.log ci examples exp logo modules scripts setup.sh sferes tests waf waf_tools wscript Singularity> bash setup.sh WARNING simplejson not found some function may not work Traceback (most recent call last): File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Scripting.py", line 119, in waf_entry_point run_commands() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Scripting.py", line 175, in run_commands parse_options() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Scripting.py", line 158, in parse_options ctx.execute() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Options.py", line 198, in execute super(OptionsContext,self).execute() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Context.py", line 85, in execute self.recurse([os.path.dirname(g_module.root_path)]) File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Context.py", line 126, in recurse user_function(self) File "/git/sferes2/wscript", line 94, in options opt.logger = Logs.make_logger(blddir + 'options.log', 'mylogger') File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Logs.py", line 179, in make_logger hdlr=logging.FileHandler(path,'w',encoding=encoding) File "/usr/lib/python3.6/logging/init.py", line 1032, in init StreamHandler.init(self, self._open()) File "/usr/lib/python3.6/logging/init.py", line 1061, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) PermissionError: [Errno 13] Permission denied: '/git/sferes2/buildoptions.log' WARNING simplejson not found some function may not work Traceback (most recent call last): File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Scripting.py", line 119, in waf_entry_point run_commands() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Scripting.py", line 175, in run_commands parse_options() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Scripting.py", line 158, in parse_options ctx.execute() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Options.py", line 198, in execute super(OptionsContext,self).execute() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Context.py", line 85, in execute self.recurse([os.path.dirname(g_module.root_path)]) File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Context.py", line 126, in recurse user_function(self) File "/git/sferes2/wscript", line 94, in options opt.logger = Logs.make_logger(blddir + 'options.log', 'mylogger') File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Logs.py", line 179, in make_logger hdlr=logging.FileHandler(path,'w',encoding=encoding) File "/usr/lib/python3.6/logging/init.py", line 1032, in init StreamHandler.init(self, self._open()) File "/usr/lib/python3.6/logging/init.py", line 1061, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) PermissionError: [Errno 13] Permission denied: '/git/sferes2/buildoptions.log'

Edit: seems like I had toyed around with removing the final sudo in the start_container.sh, so you can for now ignore this last error message. I still have some problems but will keep you updated.

Kyzarok commented 3 years ago

I have read your edit. Just as a head's up I couldn't solve some issues with Singularity until I was a root user on my device. I recommend making sure you have root access

bossdm commented 3 years ago

Hi,

Within the container, to perform the setup, I am encountering an error related to reading the folder and/or the configuration:

Visualisation available after activating the visu_server.sh script at the http://localhost:6080/ example_tf_sferes.simg exists WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container Singularity> cd /git/sferes2 Singularity> bash setup.sh WARNING simplejson not found some function may not work Command-line options for exp [exp/example_tf_sferes-master] : -> OK Setting top to : /git/sferes2 Setting out to : /git/sferes2/build Checking for 'g++' (C++ compiler) : /usr/bin/g++ Checking boost includes : 1_65_1 Checking boost libs : ok Checking Intel TBB includes (optional) : /usr/include Checking Intel TBB libs (optional) : /usr/lib/x86_64-linux-gnu Checking for MPI include (optional) : ok Checking for MPI libs (optional) : Not found Checking for Eigen : ok Checking for ssrc kdtree (KD-tree) : ok Checking pthread : /usr/lib/x86_64-linux-gnu Configuring for exp [example_tf_sferes] --- Logging error --- Traceback (most recent call last): File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Context.py", line 133, in recurse os.listdir(d) FileNotFoundError: [Errno 2] No such file or directory: '/git/sferes2/exp/example_tf_sferes'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/git/sferes2/wscript", line 183, in configure conf.recurse('exp/' + i) File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Context.py", line 135, in recurse raise Errors.WafError('Cannot read the folder %r'%d) waflib.Errors.WafError: Cannot read the folder '/git/sferes2/exp/example_tf_sferes'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.6/logging/init.py", line 994, in emit msg = self.format(record) File "/usr/lib/python3.6/logging/init.py", line 840, in format return fmt.format(record) File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Logs.py", line 134, in format return msg%rec.args TypeError: not all arguments converted during string formatting Call stack: File "./waf", line 165, in Scripting.waf_entry_point(cwd, VERSION, wafdir) File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Scripting.py", line 119, in waf_entry_point run_commands() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Scripting.py", line 179, in run_commands ctx=run_command(cmd_name) File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Scripting.py", line 170, in run_command ctx.execute() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Configure.py", line 85, in execute super(ConfigurationContext,self).execute() File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Context.py", line 85, in execute self.recurse([os.path.dirname(g_module.root_path)]) File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Context.py", line 126, in recurse user_function(self) File "/git/sferes2/wscript", line 186, in configure Logs.warn('%s -> no configuration found' % i, 'YELLOW') File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Logs.py", line 160, in warn log.warn(*k,*kw) File "/usr/lib/python3.6/logging/init.py", line 1325, in warn self.warning(msg, args, kwargs) File "/usr/lib/python3.6/logging/init.py", line 1320, in warning self._log(WARNING, msg, args, kwargs) File "/usr/lib/python3.6/logging/init.py", line 1444, in _log self.handle(record) File "/usr/lib/python3.6/logging/init.py", line 1454, in handle self.callHandlers(record) File "/usr/lib/python3.6/logging/init.py", line 1516, in callHandlers hdlr.handle(record) File "/usr/lib/python3.6/logging/init.py", line 865, in handle self.emit(record) File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Logs.py", line 84, in emit self.emit_override(record) File "/git/sferes2/.waf3-2.0.14-e67604cd8962dbdaf7c93e0d7470ef5b/waflib/Logs.py", line 108, in emit_override logging.StreamHandler.emit(self,record) Message: 'example_tf_sferes -> no configuration found' Arguments: ('YELLOW',)

--- configuration --- compiler(s):

  • CXX: gcc boost version: 1_65_1 mpi: False Compilation flags : CXXFLAGS : -D_REENTRANT -Wall -fPIC -ftemplate-depth-1024 -Wno-sign-compare -Wno-deprecated -Wno-unused -DSFERES_ROOT="/git/sferes2" -std=c++11 -DEIGEN3_ENABLED -DUSE_KDTREE LINKFLAGS: --- license --- Sferes2 is distributed under the CECILL license (GPL-compatible) Please check the accompanying COPYING file or http://www.cecill.info/ 'configure' finished successfully (0.125s) WARNING simplejson not found some function may not work Command-line options for exp [exp/example_tf_sferes-master] : -> OK Waf: Entering directory /git/sferes2/build' DEBUG is is disabled Entering directory/git/sferes2' Building exp: example_tf_sferes Cannot read the folder '/git/sferes2/exp/example_tf_sferes'

Any thoughts on this?

bossdm commented 3 years ago

if I look into the container the corresponding folder does seem to exist:

ls /git/sferes2/exp/example_tf_sferes-master/ README.md cpp python resources sferes2 singularity wscript

Kyzarok commented 3 years ago

Does example_tf_sferes get successfully built?

bossdm commented 3 years ago

yes. see the following log:

sudo bash start_container.sh Visualisation available after activating the visu_server.sh script at the http://localhost:6080/ example_tf_sferes.simg does not exist, building it now from singularity.def INFO: Starting build... INFO: Using cached image INFO: Verifying bootstrap image /root/.singularity/cache/library/sha256.79de5f0f23218221fb984497b7fc35e5ecc6b806f378bbe67edea6c25fc59342 WARNING: integrity: signature not found for object group 1 WARNING: Bootstrap image could not be verified, but build will continue. INFO: Copying ./resources/setup.sh to /home/david/MScProject_AURORA_with_RNN/OriginalCode/example_tf_sferes-master/singularity/build-temp-052815518/rootfs/git/sferes2 INFO: Running post scriptlet

  • export LD_LIBRARY_PATH=/workspace/lib:/workspace/lib:/workspace/lib:/workspace/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
  • exit 0 INFO: Adding help info INFO: Adding labels INFO: Adding runscript INFO: Creating sandbox directory... INFO: Build complete: example_tf_sferes.simg WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container

Do you think the failed setup may be related to removing the --fakeroot from the start_container.sh? Without removing it I get the first-mentioned errors.

Kyzarok commented 3 years ago

fakeroot is only necessary because Cully didn't have root access if I remember correctly. If you have full permissions on your device it shouldn't cause any problems.

If the .simg file is being built that should mean that the "sudo singularity build" line was called. The only line after that in start_container.sh is the line that spawns the shell. It could be that example_tf_sferes.simg isn't being built in /git/sferes2/exp/. I think this could be the case as while you've proven that it gets built: example_tf_sferes.simg exists setup.sh can't seem to find it FileNotFoundError: [Errno 2] No such file or directory: '/git/sferes2/exp/example_tf_sferes'

Can you check inside /git/sferes2/exp/ ?

bossdm commented 3 years ago

yeah it does exist as I mentioned in one of my previous comments.

ls /git/sferes2/exp/example_tf_sferes-master/ README.md cpp python resources sferes2 singularity wscript

Is this how it is supposed to look?

bossdm commented 3 years ago

oh I see. There is the -master suffix. No the one without the -master suffix is not there.

bossdm commented 3 years ago

I can see

ls /git/sferes2/exp/example_tf_sferes-master/singularity/example_tf_sferes.simg

Perhaps a symbolic link can be created from this to "/git/sferes2/exp/example_tf_sferes" ?

Kyzarok commented 3 years ago

Then that'll be the issue, if you change where example_tf_sferes.simg goes so that setup.sh can find it then it should work. You should be able to do this by changing the last line in start_container.sh and change the destination

Kyzarok commented 3 years ago

A symbolic link could also work

bossdm commented 3 years ago

I made some progress by adding instead the suffix inside the setup.sh file:

./waf configure --exp example_tf_sferes-master --kdtree /workspace/include
./waf --exp example_tf_sferes-master

Compilation starts but I still get an error:

bash setup.sh WARNING simplejson not found some function may not work Command-line options for exp [exp/example_tf_sferes-master] : -> OK Command-line options for exp [exp/example_tf_sferes] : -> no option found Setting top to : /git/sferes2 Setting out to : /git/sferes2/build Checking for 'g++' (C++ compiler) : /usr/bin/g++ Checking boost includes : 1_65_1 Checking boost libs : ok Checking Intel TBB includes (optional) : /usr/include Checking Intel TBB libs (optional) : /usr/lib/x86_64-linux-gnu Checking for MPI include (optional) : ok Checking for MPI libs (optional) : Not found Checking for Eigen : ok Checking for ssrc kdtree (KD-tree) : ok Checking pthread : /usr/lib/x86_64-linux-gnu Configuring for exp [example_tf_sferes-master] done example_tf_sferes-master -> ok

--- configuration --- compiler(s):

  • CXX: gcc boost version: 1_65_1 mpi: False Compilation flags : CXXFLAGS : -D_REENTRANT -Wall -fPIC -ftemplate-depth-1024 -Wno-sign-compare -Wno-deprecated -Wno-unused -DSFERES_ROOT="/git/sferes2" -std=c++11 -DEIGEN3_ENABLED -DUSE_KDTREE LINKFLAGS: --- license --- Sferes2 is distributed under the CECILL license (GPL-compatible) Please check the accompanying COPYING file or http://www.cecill.info/ 'configure' finished successfully (0.114s) WARNING simplejson not found some function may not work Command-line options for exp [exp/example_tf_sferes-master] : -> OK Command-line options for exp [exp/example_tf_sferes] : -> no option found Waf: Entering directory /git/sferes2/build' DEBUG is is disabled Entering directory/git/sferes2' Building exp: example_tf_sferes-master [ 5/19] Linking build/examples/ex_ea [ 9/19] Linking build/examples/ex_nsga2 [11/19] Linking build/examples/ex_diversity [13/19] Linking build/examples/ex_eps_moea [15/19] Linking build/examples/ex_map_elites [17/19] Linking build/examples/ex_qd [18/19] Compiling exp/example_tf_sferes-master/cpp/tf_exp.cpp In file included from ../exp/example_tf_sferes-master/cpp/tf_exp.cpp:37:0: ../exp/example_tf_sferes-master/cpp/sferes/eval/parallel.hpp:41:10: fatal error: parallel.hpp: No such file or directory

    include

      ^~~~~~~~~~~~~~

    compilation terminated.

Waf: Leaving directory `/git/sferes2/build' Build failed -> task in 'example' failed with exit status 1 (run with -v to display more information)

I suppose this refers to cpp/sferes/parallel.hpp. Specifying the --include /git/sferes2/sferes option in the setup does not work, it does not seem to be recognised, even though I can see it when doing

ls /git/sferes2/sferes

Kyzarok commented 3 years ago

So effectively it is failing at the first line when it is compling the executable file tf_exp.cpp. The first line of this is #include <iostream> and that passes, but fails at #include <parallel.hpp>? In the version of tf_exp.cpp in the repo I have it as #include <sferes/eval/parallel.hpp>, is this the same in the version you are running?

bossdm commented 3 years ago

hi, maybe a bit confusing, but the issue is in line 41 of sferes/eval/parallel.hpp which is , which should be the header in the main sferes directory

Kyzarok commented 3 years ago

Wait is it trying to include itself? Then taking the #include <parallel.hpp> line should just make it work. I don't think I came across that issue before. Which version of sferes are you using?

bossdm commented 3 years ago

no, there are two distinct parallel.hpp files, one in sferes/eval/ and one in sferes/

bossdm commented 3 years ago

maybe prepending sferes/ could work, but then again it could still another file, maybe worth trying

bossdm commented 3 years ago

hm it does continue after that but then the next #include fails in both cases (with and without sferes/ prefix).

bossdm commented 3 years ago

oh it succesfully installed now with sferes/parallel.hpp and sferes/eval/eval.hpp

Kyzarok commented 3 years ago

that makes little sense to me but it's good to know that it's closer to being compiled

bossdm commented 3 years ago

perhaps you missed the last post but it is compiled now. somehow the right include directories were not found but specifying them manually was needed.

So in short, my fakeroot didn't work and I made the following changes: -add sudo to both lines in the start_container -add -master suffix to the exp argument in the setup.sh -specify the full directories in sferes/eval/parallel.hpp:

 #include <sferes/parallel.hpp>
 #include <cmath>
 #include <sferes/eval/eval.hpp>
Kyzarok commented 3 years ago

Thanks for summarising, if that's everything then I will change the OriginalCode's README to be more up to date. Out of curiosity what OS were you using for this?

bossdm commented 3 years ago

hi before you change this, let me try a few more things. want to make sure the change is as simple as possible.

Kyzarok commented 3 years ago

Sure, thanks for raising this issue!

bossdm commented 3 years ago

Ok indeed, the code compiles with a much simpler change (and now also without tensorflow issues after compilation): just change the name of the directory to

MScProject_AURORA_with_RNN/OriginalCode/example_tf_sferes

instead of

MScProject_AURORA_with_RNN/OriginalCode/example_tf_sferes-master

my OS is: Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic

The --fakeroot issue (one of the first comments) is not solved though. I think it will become problematic when using this on a cluster as there I won't have sudo privileges. I don't have experience with using --fakeroot so it could be something simple that I am not doing. Let me know if you have any idea about this.

Thanks for the help!

Kyzarok commented 3 years ago

No problem, thanks for the fix!

bossdm commented 3 years ago

it seems when cloning the repository and redoing the container, I ran into the same sort of problems. so I pushed the additional fixes that I did as the folder name was seemingly not the only problem. Perhaps you can check on your system as well.

Kyzarok commented 3 years ago

For me using Singularity stopped working once I updated my OS. I attempted rerunning this with VagrantBox but other issues arised because of it. I am setting up a new PC soon and will test then