Closed at15 closed 7 years ago
can't use 9p for shared folder
There was an error talking to Libvirt. The error message is shown
below:
Call to virDomainCreateWithFlags failed: internal error: process exited while connecting to monitor: 2017-03-13T16:22:05.452554Z qemu-system-x86_64: -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=b0211f19c2b24becc176a46c2524d9f,bus=pci.0,addr=0x6: 9pfs Failed to initialize fs-driver with id:fsdev-fs0 and export path:/home/at15/workspace/src/github.com/at15/hadoop-spark-perf/provision/base
got error when package the box
base: Require set read access to /var/lib/libvirt/images/base_base.img. sudo chmod a+r /var/lib/libvirt/images/base_base.img
and another
/home/at15/.vagrant.d/gems/gems/vagrant-libvirt-0.0.37/lib/vagrant-libvirt/action/package_domain.rb:41:in ``': No such file or directory - virt-sysprep (Errno::ENOENT)
re packaged box stuck on waiting for ssh to become available ....
==> single: Waiting for domain to get an IP address...
==> single: Waiting for SSH to become available...
found a similar one on https://github.com/vagrant-libvirt/vagrant-libvirt/issues/452 I guess this is related to packing the box, so the solution is simple ... I just don't package the box ... run the install script on every node ....
follow this to change the pool, but it seems got error for permission http://ask.xmodulo.com/change-default-location-libvirt-vm-images.html
Call to virDomainCreateWithFlags failed: Cannot access storage file '/home/at15/tmp/libvirt/cluster_slave2.img' (as uid:107, gid:107): Permission denied
drwxr-xr-x. 2 root root 4096 Mar 13 11:02 images
drwxrwxr-x 2 at15 at15 4096 Mar 13 11:07 libvirt
https://github.com/adrahon/vagrant-kvm/issues/163 mentioned change to root, may need reboot or logout? /etc/libvirt/qemu.conf
In particular note that if using the "system" instance and attempting to store disk images in a user home directory, the default permissions on $HOME are typically too restrictive to allow access.
solution
change user = "at15"
in /etc/libvirt/eqmu.conf
, the group is root, I don't if it will still work if I comment out the group ... but ... e
Hadoop
Stop HDFS and Yarn
Stopping namenodes on [master.perf.at15]
master.perf.at15: Warning: Permanently added 'master.perf.at15,192.168.233.18' (ECDSA) to the list of known hosts.
master.perf.at15: no namenode to stop
slave2.perf.at15: Warning: Permanently added 'slave2.perf.at15,192.168.233.20' (ECDSA) to the list of known hosts.
slave1.perf.at15: Warning: Permanently added 'slave1.perf.at15,192.168.233.19' (ECDSA) to the list of known hosts.
master.perf.at15: Warning: Permanently added 'master.perf.at15,192.168.233.18' (ECDSA) to the list of known hosts.
slave2.perf.at15: no datanode to stop
slave1.perf.at15: stopping datanode
master.perf.at15: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
no resourcemanager to stop
master.perf.at15: Warning: Permanently added 'master.perf.at15,192.168.233.18' (ECDSA) to the list of known hosts.
slave1.perf.at15: Warning: Permanently added 'slave1.perf.at15,192.168.233.19' (ECDSA) to the list of known hosts.
slave2.perf.at15: Warning: Permanently added 'slave2.perf.at15,192.168.233.20' (ECDSA) to the list of known hosts.
slave1.perf.at15: stopping nodemanager
master.perf.at15: stopping nodemanager
slave2.perf.at15: stopping nodemanager
slave2.perf.at15: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
Finish stop HDFS and Yarn
Spark
seems all the nodes are not started .... even the master node itself
And the master always took a long time to start, which is quite strange ...
the error message for spark is
17/03/13 19:13:03 INFO master.Master: I have been elected leader! New state: ALIVE
17/03/13 19:14:24 INFO master.Master: 192.168.233.1:33110 got disassociated, removing it.
17/03/13 19:14:31 INFO master.Master: 192.168.233.18:59868 got disassociated, removing it.
17/03/13 19:14:53 INFO master.Master: 192.168.233.18:59870 got disassociated, removing it.
17/03/13 19:15:13 INFO master.Master: 192.168.233.18:59874 got disassociated, removing it.
might have to do with selinux ...
/etc/selinux/config
set to disabled, nop, won't workhadoop datanode fail to start http://stackoverflow.com/questions/22316187/datanode-not-starts-correctly because fedora does not clean tmp, so I can't format namenode everytime ...
got exception for spark ...
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at com.intel.hibench.sparkbench.micro.ScalaSort$.main(ScalaSort.scala:47)
at com.intel.hibench.sparkbench.micro.ScalaSort.main(ScalaSort.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I guess that's because how I built hibench? Yeah ... re built in the master box works ....
ok ... got the perf for hadoop cluster in slave node Yarnchild
2551.303147 task-clock:u (msec) # 0.030 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
2,734 page-faults:u # 0.001 M/sec
7,952,773,016 cycles:u # 3.117 GHz
12,035,713,476 instructions:u # 1.51 insn per cycle
1,913,355,376 branches:u # 749.952 M/sec
44,277,795 branch-misses:u # 2.31% of all branches
spark coarsegrainedexecutorbackend cluster
15610.605287 task-clock:u (msec) # 0.605 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
428,833 page-faults:u # 0.027 M/sec
48,751,727,760 cycles:u # 3.123 GHz
63,078,193,487 instructions:u # 1.29 insn per cycle
9,912,664,768 branches:u # 634.996 M/sec
154,409,990 branch-misses:u # 1.56% of all branches
spark coarse .... single, sort small
10812.559971 task-clock:u (msec) # 0.619 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
81,832 page-faults:u # 0.008 M/sec
36,533,260,893 cycles:u # 3.379 GHz
60,894,812,484 instructions:u # 1.67 insn per cycle
9,876,637,449 branches:u # 913.441 M/sec
217,182,645 branch-misses:u # 2.20% of all branches
Due to #13 , we need to use libvirt, currently only the fedora box is working, the base script may need some modification, or I could simply execute the install script on every machine, only hibench took a long time to compile, I could use the pre compiled one
ifconfig
Ref