Open i-chaochen opened 10 years ago
That document looks out of date. You don't want to use the really old SVN repo. You want to use this Github one.
yes, I tried the source from github, but it still failed to build
git clone git://github.com/apavlo/h-store.git ant build
ee-build: [exec] make: Entering directory `/home/ubuntu/h-store/obj/release' [exec] g++ -Wall -Wextra -Werror -Woverloaded-virtual -Wconversion -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Winit-self -Wno-sign-compare -Wno-unused-parameter -pthread -DSTDC_CONSTANT_MACROS -DSTDC_LIMIT_MACROS -DNOCLOCK -fno-omit-frame-pointer -fvisibility=hidden -DBOOST_SP_DISABLE_THREADS -Wno-ignored-qualifiers -fno-strict-aliasing -Wno-attributes -DLINUX -fPIC -Wno-unused-but-set-variable -DANTICACHE -DANTICACHE_REVERSIBLE_LRU -isystem ../../third_party/cpp -isystem ../../obj/release/berkeleydb -I../../src/ee -c -g3 -O3 -mmmx -msse -msse2 -msse3 -DNDEBUG -DVOLT_LOG_LEVEL=500 -o objects//voltdbjni.co ../../src/ee//voltdbjni.cpp
BUILD FAILED /home/ubuntu/h-store/build.xml:860: exec returned: 137
Total time: 9 minutes 36 seconds
thanks
Is there an error from gcc? It's weird that it just fails like that?
I think I finally figure out this problem, it runs out of all memory at DANTICACHE_REVERSIBLE_LRU -isystem ../../third_party/cpp -isystem ../../obj/release/berkeleydb -I../../src/ee -c -g3 -O3 -mmmx -msse -msse2 -msse3 -DNDEBUG -DVOLT_LOG_LEVEL=500 -o objects//voltdbjni.co ../../src/ee//voltdbjni.cpp
I used a micro ec2 which only has 0.6g memory...
I try another medium one and build successfully.
to who wants to try hstore on AWS please at lease use a medium size ec2...
thanks
now I can build it but still unable to execute the benchmark at AWS NFS cluster.
my 2 nfs cluster nodes within the same security group TCP Port (Service) Source Action 22 (SSH) 0.0.0.0/0 Delete 111 0.0.0.0/0 Delete 2049 0.0.0.0/0 Delete 44182 0.0.0.0/0 Delete 54508 0.0.0.0/0 Delete UDP Port (Service) Source Action 111 0.0.0.0/0 Delete 2049 0.0.0.0/0 Delete 32768 0.0.0.0/0 Delete 32770 - 32800 0.0.0.0/0 Delete
I configure the ssh environment sudo apt-get --yes install openssh-server ssh-keygen -t dsa # Do not enter in a password cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys $ ssh -o StrictHostKeyChecking=no localhost "date" Wed Jan 29 00:58:12 UTC 2014
$ ssh localhost date Wed Jan 29 01:00:14 UTC 2014
I scp my hstore.pem on nfs server node cp hstore.pem ~/.ssh/ && chmod 400 ~/.ssh/hstore.pem
change the global.sshoptions parameter in $HSTORE_HOME/properties/default.properties as global.sshoptions = -i /home/ubuntu/.ssh/hstore.pem
create a cluster.txt as follow: host0.ip-172-31-xx-xxx.eu-west-1.compute.internal:0:0-1 host1.ip-172-31-xx-xx.eu-west-1.compute.internal:1:2-3
no problem at here ant hstore-prepare -Dproject=tpcc -Dhosts=/home/ubuntu/cluster.txt
$ ant hstore-benchmark -Dproject=tpcc Buildfile: /home/ubuntu/h-store/build.xml
hstore-benchmark:
benchmark: [java] 00:58:59,774 INFO - ------------------------- BENCHMARK INITIALIZE :: TPCC ------------------------- [java] 00:58:59,854 INFO - Starting HStoreSite H00 on host0.ip-172-31-33-172.eu-west-1.compute.internal [java] 00:58:59,907 INFO - Starting HStoreSite H01 on host1.ip-172-31-24-5.eu-west-1.compute.internal [java] 00:58:59,980 INFO - Waiting for 2 HStoreSites with 4 partitions to finish initialization [java] 00:59:04,910 ERROR - Failed to poll 'site-00-host0.ip-172-31-33-172.eu-west-1.compute.internal' [exitValue=255] [java] 00:59:04,910 FATAL - Process 'site-00-host0.ip-172-31-33-172.eu-west-1.compute.internal' failed. Halting benchmark! [java] 00:59:06,413 FATAL - Failed to complete benchmark [java] java.lang.RuntimeException: Failed to start all HStoreSites. Halting benchmark [java] at edu.brown.api.BenchmarkController.startSites(BenchmarkController.java:633) [java] at edu.brown.api.BenchmarkController.setupBenchmark(BenchmarkController.java:504) [java] at edu.brown.api.BenchmarkController.main(BenchmarkController.java:2216)
BUILD FAILED /home/ubuntu/h-store/build.xml:2517: The following error occurred while executing this line: /home/ubuntu/h-store/build.xml:1693: Java returned: 1
Total time: 15 seconds
didn't see any useful log from these 2 nodes ~/h-store/obj/logs/sites$ cat site-00-host0.ip-172-31-xx-xxx.eu-west-1.compute.internal.log
:~/h-store/obj/logs/sites$ cat site-01-host1.ip-172-31-xx-xxx.eu-west-1.compute.internal.log
any advices?
thanks!
Use the internal IP addresses instead of the public ones.
yes, I am using the aws internal dns as you can see my cluster.txt host0.ip-172-31-xx-xxx.eu-west-1.compute.internal:0:0-1 host1.ip-172-31-xx-xx.eu-west-1.compute.internal:1:2-3
and internal ip for nfs cluster
but it just can't execute.
do you mean I use internal ip address instead of internal dns address at cluster.txt?
so like this? host0.172.31.xx.xxx :0:0-1 host1.172-31.xx.xx:1:2-3
thanks
Enable DEBUG for 'org/voltdb/processtools/ProcessSetManager.java' in log4j.properties
Andy Pavlo pavlo@cs.cmu.edu
sorry I am not sure I'm completely following you, I changed voltdb area as DEBUG at log4j.properties
log4j.logger.org.voltdb.VoltProcedure=DEBUG log4j.logger.org.voltdb.VoltSystemProcedure=DEBUG log4j.logger.org.voltdb.client=DEBUG log4j.logger.org.voltdb.compiler=DEBUG log4j.logger.org.voltdb.planner=DEBUG
after ant hstore-prepare -Dproject=tpcc -Dhosts=/home/ubuntu/cluster.txt I haven't seen any things related to SSH command.
still, $ ant hstore-benchmark -Dproject=tpcc Buildfile: /home/ubuntu/h-store/build.xml
hstore-benchmark:
benchmark: [java] 03:16:24,604 INFO - ------------------------- BENCHMARK INITIALIZE :: TPCC ------------------------- [java] 03:16:24,673 INFO - Starting HStoreSite H00 on host0.ip-172-31-xx-xx.eu-west-1.compute.internal [java] 03:16:24,726 INFO - Starting HStoreSite H01 on host1.ip-172-31-xx-xx.eu-west-1.compute.internal [java] 03:16:24,782 INFO - Starting HStoreSite H02 on host2.ip-172-31-xx-xx.eu-west-1.compute.internal [java] 03:16:24,863 INFO - Waiting for 3 HStoreSites with 6 partitions to finish initialization [java] 03:16:29,729 ERROR - Failed to poll 'site-01-host1.ip-172-31-xx-xx.eu-west-1.compute.internal' [exitValue=255] [java] 03:16:29,729 FATAL - Process 'site-01-host1.ip-172-31-xx-xx.eu-west-1.compute.internal' failed. Halting benchmark! [java] 03:16:31,232 FATAL - Failed to complete benchmark [java] java.lang.RuntimeException: Failed to start all HStoreSites. Halting benchmark [java] at edu.brown.api.BenchmarkController.startSites(BenchmarkController.java:633) [java] at edu.brown.api.BenchmarkController.setupBenchmark(BenchmarkController.java:504) [java] at edu.brown.api.BenchmarkController.main(BenchmarkController.java:2216)
BUILD FAILED /home/ubuntu/h-store/build.xml:2517: The following error occurred while executing this line: /home/ubuntu/h-store/build.xml:1693: Java returned: 1
Total time: 11 seconds
I checked the log it hasn't any useful info still
$ cat site-01-host1.ip-172-31-xx-xx.eu-west-1.compute.internal.log
thanks
hi, andy
I checked ProcessSetManager.java ,
does use "ping" command to create the process?
public static void main(String[] args) {
ProcessSetManager psm = new ProcessSetManager();
psm.startProcess("ping4c", new String[] { "ping", "volt4c" });
psm.startProcess("ping3c", new String[] { "ping", "volt3c" });
while(true) {
OutputLine line = psm.nextBlocking();
System.out.printf("(%s:%s): %s\n", line.processName, line.stream.name(), line.value);
}
}
I open the ICMP port to security group but still unable to execute the benchmark
and then I open ALL traffic ports to all ips at this security group, so I think no matter what kind of commands hstore use it should have no problem within security group.
but it still fails to execute the benchmark [java] 22:23:22,433 INFO - Starting HStoreSite H00 on host0.ip-172-31-xx-x.eu-west-1.compute.internal [java] 22:23:22,572 INFO - Starting HStoreSite H01 on host1.ip-172-31-xx-x.eu-west-1.compute.internal [java] 22:23:22,709 INFO - Starting HStoreSite H02 on host2.ip-172-31-xx-x.eu-west-1.compute.internal [java] 22:23:22,837 INFO - Waiting for 3 HStoreSites with 6 partitions to finish initialization [java] 22:23:27,595 ERROR - Failed to poll 'site-01-host1.ip-172-31-xx-x.eu-west-1.compute.internal' [exitValue=255] [java] 22:23:27,596 FATAL - Process 'site-01-host1.ip-172-31-xx-x.eu-west-1.compute.internal' failed. Halting benchmark! [java] 22:23:29,100 FATAL - Failed to complete benchmark [java] java.lang.RuntimeException: Failed to start all HStoreSites. Halting benchmark [java] at edu.brown.api.BenchmarkController.startSites(BenchmarkController.java:633) [java] at edu.brown.api.BenchmarkController.setupBenchmark(BenchmarkController.java:504) [java] at edu.brown.api.BenchmarkController.main(BenchmarkController.java:2216)
BUILD FAILED /home/ubuntu/h-store/build.xml:2517: The following error occurred while executing this line: /home/ubuntu/h-store/build.xml:1693: Java returned: 1
Total time: 50 seconds
and there is no info for these two logs except date ~/h-store/obj/logs/sites$ cat site-01-host1.172.31.xx.x.eu-west-1.compute.internal.log
~/h-store/obj/logs/sites$ cat site-01-host1.ip-172-31-xx-x.eu-west-1.compute.internal.log
I am quite suspecting about cluster.txt, is it on the right format? $ cat cluster.txt host0.ip-172-31-xx-x.eu-west-1.compute.internal:0:0-1 host1.ip-172-31-xx-x.eu-west-1.compute.internal:1:2-3 host2.ip-172-31-xx-x.eu-west-1.compute.internal:2:4-5
any further advices will be appreciated.
thanks
Add this to the bottom of log4j.properties:
log4j.logger.org.voltdb.processtools.ProcessSetManager=DEBUG
Run the benchmark with this turned on, then check the site log to look for the SSH command that it's trying to send over the wire. Then copy and paste that command in a terminal to check whether it works.
yes, I add it and copy the ssh commands run it by hand, it displays failed to connect to remote site
I check the source codes about connecting remote codes have two things quite confused
does the ssh login username effect the connection? I change all host0, host1 and host2 as ubuntu at cluster. txt, since it's default name for ec2, but still failed at execution.
the autofs part it sets as
which automatically syncs all folders and files under /home/
but when I set each nfs server and clients ssh environment by ssh-keygen -t dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
the autofs will automatically sync each key to all other.
which means I only can run ssh localhost date
at one ec2.
so, should I re-write my auto.home file not sync all files under /home/& ?
because I see document mentioned specifically that the directory needs to end with a '/' followed by a '&
but it looks like against the ssh environment configuration. so would you give me some clues? please
thanks
On 29 Jan 2014 22:48, "Andy Pavlo" notifications@github.com wrote:
Add this to the bottom of log4j.properties:
log4j.logger.org.voltdb.processtools.ProcessSetManager=DEBUG
Run the benchmark with this turned on, then check the site log to look for the SSH command that it's trying to send over the wire. Then copy and paste that command in a terminal to check whether it works.
Reply to this email directly or view it on GitHubhttps://github.com/apavlo/h-store/issues/152#issuecomment-33640876 .
hi andy
I changed all ec2's hostname same as cluster.txt and only mount h-store folder instead of /home/& within NFS clusters this time, and I add this line in log4j.properties: log4j.logger.org.voltdb.processtools.ProcessSetManager=DEBUG
and I run ssh command by hand, it returns as "Unable to set CPU affinity.." and "Insufficient number of cores " so disable transaction pre/post processing threads, and the connection and execution is failed.
but I can execute H-store benchmark at a single large size ec2 without any problem.
I build this NFS Cluster at AWS by 3 same large size ec2, it indicates insufficient number of cores.
Does hstore is a sharding nosql system, each node within system is isolated with others? Should it need less system resource if I use a cluster to run this benchmark instead of a singe machine?
why I can execute it at a single large ec2 but can't execute it at 3 equal size ec2 as insufficient number of cores? should I use a more expensive larger ec2 to build cluster to execute this benchmark? or is any other thing I did wrong, such as only mounted h-store folder within the NFS cluster?
would you give me some clues on it, please?
thanks!
hi, andy
I follow the document about running on EC2 steps as follows but failed to ant build
sudo vim /etc/apt/sources.list deb http://archive.canonical.com/ubuntu lucid partner deb-src http://archive.canonical.com/ubuntu lucid partner
sudo apt-get update
Package sun-java6-jdk is not available so I change it as openjdk-6-jdk sudo apt-get --yes install subversion gcc g++ make openjdk-6-jdk valgrind ant
svn co https://database.cs.brown.edu/svn/hstore/trunk/ $HSTORE_HOME
cp hstore.pem ~/.ssh/ && chmod 400 ~/.ssh/hstore.pem
vim trunk/properties/default.properties
global.sshoptions = -i /home/ubuntu/.ssh/hstore.pem
ant build
ee:
BUILD FAILED /home/ubuntu/trunk/build.xml:715: exec returned: 137
because svn ant build failed, so I remove it and try the source from git
sudo rm -r trunk/ sudo apt-get install git git clone git://github.com/apavlo/h-store.git ant build
ee-build: [exec] make: Entering directory `/home/ubuntu/h-store/obj/release' [exec] g++ -Wall -Wextra -Werror -Woverloaded-virtual -Wconversion -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Winit-self -Wno-sign-compare -Wno-unused-parameter -pthread -DSTDC_CONSTANT_MACROS -DSTDC_LIMIT_MACROS -DNOCLOCK -fno-omit-frame-pointer -fvisibility=hidden -DBOOST_SP_DISABLE_THREADS -Wno-ignored-qualifiers -fno-strict-aliasing -Wno-attributes -DLINUX -fPIC -Wno-unused-but-set-variable -DANTICACHE -DANTICACHE_REVERSIBLE_LRU -isystem ../../third_party/cpp -isystem ../../obj/release/berkeleydb -I../../src/ee -c -g3 -O3 -mmmx -msse -msse2 -msse3 -DNDEBUG -DVOLT_LOG_LEVEL=500 -o objects//voltdbjni.co ../../src/ee//voltdbjni.cpp
BUILD FAILED /home/ubuntu/h-store/build.xml:860: exec returned: 137
Total time: 9 minutes 36 seconds
any helps will be greatly appreciated !