charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
207 stars 50 forks source link

charmrun.C compile fails on PSC Blacklight with icc 15.0.1 and gcc/g++ = 4.3.x #740

Closed jcphill closed 9 years ago

jcphill commented 9 years ago

Original issue: https://charm.cs.illinois.edu/redmine/issues/740


Ironically, this is a multicore build so there is no need to compile charmrun.C. This is an SGI Ultraviolet that is supposedly retired now. The error is with the Intel 15.0.1 compiler and is likely an old header file issue.

../../bin/charmc -optimize -production   -Wno-error -lm -I.. -c -seq -DCMK_NOT_USE_CONVERSE=1 -DNOTIFY charmrun.C
/usr/include/c++/4.3/ext/new_allocator.h(114): error: a value of type "long" cannot be used to initialize an entity of type "const char *"
        { ::new((void *)__p) _Tp(std::forward<_Args>(__args)...); }
                                 ^
          detected during:
            instantiation of "void __gnu_cxx::new_allocator<_Tp>::construct(__gnu_cxx::new_allocator<_Tp>::pointer, _Args &&...) [with _Tp=const char *, _Args=<long>]" at line 704 of "/usr/include/c++/4.3/bits/stl_vector.h"
            instantiation of "void std::vector<_Tp, _Alloc>::push_back(_Args &&...) [with _Tp=const char *, _Alloc=std::allocator<const char *>, _Args=<long>]" at line 4188 of "charmrun.C"

compilation aborted for charmrun.C (code 2)
Fatal Error by charmc in directory /usr/tmp/charm-6.7.0-build-2015-May-15-96807-multicore-linux64-iccstatic/charm-6.7.0-pre/multicore-linux64-iccstatic/tmp/charmrun-src
   Command icpc -DCMK_SEQUENTIAL=1 -I../../bin/../include -D__CHARMC__=1 -I.. -DCMK_NOT_USE_CONVERSE=1 -DNOTIFY -O2 -U_FORTIFY_SOURCE -std=c++0x -c charmrun.C -o charmrun.o returned error code 2
charmc exiting...
gmake[1]: Leaving directory `/var/tmp/charm-6.7.0-build-2015-May-15-96807-multicore-linux64-iccstatic/charm-6.7.0-pre/multicore-linux64-iccstatic/tmp/charmrun-src'
gmake[1]: *** [charmrun-notify] Error 1
gmake: *** [charmrun] Error 2
trquinn commented 5 years ago

Original date: 2015-05-19 21:46:25


I'm seeing the same thing with gcc 4.3. I think this is a bug introduced in commit b100a28. Please upgrade the priority. Here is a proposed patch:

diff --git a/src/util/charmrun-src/charmrun.C b/src/util/charmrun-src/charmrun.C
index f72ca5d..200eff2 100644
--- a/src/util/charmrun-src/charmrun.C
+++ b/src/util/charmrun-src/charmrun.C
`` -4185,7 +4185,7 `` int rsh_fork(int nodeno, const char *startScript)
   rshargv.push_back("-l");
   rshargv.push_back(nodetab_login(nodeno));
   rshargv.push_back("/bin/bash -f");
-  rshargv.push_back(NULL);
+  rshargv.push_back((const char *) NULL);

   std::string cmd_str = rshargv[0];
   for (int n = 1; n < rshargv.size()-1; ++n)
`` -4577,7 +4577,7 `` void read_global_segments_size()
   tmp = (char *) malloc(sizeof(char) * 9 + strlen(arg_nodeprog_r));
   sprintf(tmp, "size -A %s", arg_nodeprog_r);
   rshargv.push_back(tmp);
-  rshargv.push_back(NULL);
+  rshargv.push_back((const char *)NULL);

   childPid = fork();
   if (childPid < 0) {
`` -4620,7 +4620,7 `` void open_gdb_info()
   tmp = (char *) malloc(sizeof(char) * 8 + strlen(arg_nodeprog_r));
   sprintf(tmp, "gdb -q %s", arg_nodeprog_r);
   rshargv.push_back(tmp);
-  rshargv.push_back(NULL);
+  rshargv.push_back((const char *)NULL);

   pipe(fdin);
   pipe(fdout);
`` -4813,7 +4813,7 `` int rsh_fork_one(const char *startScript)
   sprintf(npes, "%d", nodetab_rank0_size);
   rshargv.push_back(npes);
   rshargv.push_back((char *) startScript);
-  rshargv.push_back(NULL);
+  rshargv.push_back((const char *)NULL);
   if (arg_verbose)
PhilMiller commented 5 years ago

Original date: 2015-05-19 22:14:29


Tom, could you please post the full error text you encountered?

PhilMiller commented 5 years ago

Original date: 2015-05-19 23:06:12


Using gcc/g++ 4.3.4 on PPL server 'prudence', I am able to confirm Tom's report, with an error reading "invalid conversion from 'long int' to 'const char*'". It does not occur with gcc 4.4.3. Between the age of the failing compiler (released 2008, last updated 2011), and the much better feature support (particularly, standard atomic operations) that starts appearing in 4.4, this may be a reason to commit to a move forward.

Ralf will post about oddities with Intel's compiler shortly.

pplimport commented 5 years ago

Original author: Ralf Gunter Corrêa Carvalho Original date: 2015-05-19 23:15:58


I'm not able to reproduce this on golub (intel 15.0.0) nor stampede (15.0.2) with the following build command:

./build charm++ multicore-linux64 iccstatic -j4 -g

The specific charmc command given in the bug description also works on both machines.

trquinn commented 5 years ago

Original date: 2015-05-20 00:20:17


Here is the error that gcc 4.3 gives me. It is similar to the Intel error.

/usr/include/c++/4.3/ext/new_allocator.h: In member function 'void __gnu_cxx::new_allocator<_Tp>::construct(_Tp*, _Args&& ...) [with _Args = long int, _Tp = const char*]':
/usr/include/c++/4.3/bits/stl_vector.h:703:   instantiated from 'void std::vector<_Tp, _Alloc>::push_back(_Args&& ...) [with _Args = long int, _Tp = const char*, _Alloc = std::allocator<const char*>]'
charmrun.C:4188:   instantiated from here
/usr/include/c++/4.3/ext/new_allocator.h:114: error: invalid conversion from 'long int' to 'const char*'
Fatal Error by charmc in directory /u/trquinn/src/charm/verbs-linux-x86_64-smp/tmp/charmrun-src
   Command g++ -m64 -m64 -fPIC -DCMK_SEQUENTIAL=1 -I../../bin/../include -D__CHARMC__=1 -I.. -DCMK_NOT_USE_CONVERSE=1 -DNOTIFY -g -O2 -fno-stack-protector -std=c++0x -c charmrun.C -o charmrun.o returned error code 1
charmc exiting...
PhilMiller commented 5 years ago

Original date: 2015-05-21 17:59:08


I just looked into OS releases vice gcc 4.3. In Debian, oldoldstable (two releases back) defaulted to 4.4, and in Ubuntu 10.04 (two LTS releases back) also defaulted to 4.4. Similarly, for the Redhat family, 6.6 contains 4.4, and current 7.1 release is much newer.

Are there systems of interest where it's necessary to use gcc 4.3 for some reason?

trquinn commented 5 years ago

Original date: 2015-05-21 18:11:08


The default compiler on the NASA Pleiades system, the default compiler is gcc 4.3.4. If you choose not to support this compiler, you will need to add some code to detect this and issue an informative message.

Wouldn't my patch be easier?

PhilMiller commented 5 years ago

Original date: 2015-05-21 18:24:55


There's no doubt that it's easy to apply the patch in question. I'm essentially looking at this particular report as a foil for a broader discussion of what compiler versions we should expect, and be able to use features from.

trquinn commented 5 years ago

Original date: 2015-05-21 18:43:41


Pleiades is running SUSE Linux Enterprise Server 11 SP3. This is still a supported OS (version 12 was released less than a year ago.) Supporting the default compiler in this OS might be a good idea.

PhilMiller commented 5 years ago

Original date: 2015-05-29 21:12:06


http://charm.cs.uiuc.edu/gerrit/727

PhilMiller commented 5 years ago

Original date: 2015-11-11 03:05:19


Set status back to Merged so that we can distinguish whether the code was changed in some way, or the fix was elsewhere. Both are non-open states from Redmine's perspective.