Closed Viv1986 closed 3 years ago
@karthick-rn please help
Not totally sure about your context, but assuming that you have a set of existing nodes where we want to setup Accumulo and related components, the details about those nodes go into two places:
[nodes]
section in muchos.props will contain the list of host names, with service roles associated with each node. Follow the example in muchos.props.example to go about this.fluo-muchos/conf/hosts/<<cluster name>>
where <<cluster name>>
needs to be replaced with the actual name of the cluster you are working with. For example if you run muchos setup -c myclust1
then there would need to be a file called myclust1
under the fluo-muchos/conf/hosts folder. The format of this file would be similar to the sample hosts file.Also, I'm open to your suggestions on how we can improve the README to make this clearer.
ok, it was installed good, and seems no intrustion needed, could you please specify, I suddenly setup 12 tservers per server, and can't find where that value specified on end-system, how to fix it to 1? I also setup system scripts for starting service and seems it's started it by /etc/systemd/system/accumulo-tserver@.service
Did you use the Muchos-provided option to launch systemd services, or did you implement your own? If you used Muchos' implementation, then the num_tservers
setting controls how many tservers run per node.
@Viv1986 By default, Muchos will setup 1 tablet server per worker host. Its not clear, how you ended with 12 tablet server per host! Assuming this is a dev/test cluster and the data can be regenerated, I'd suggest to ensure the following values are set in muchos.props
file - num_tservers = 1
& use_systemd = True
and then go-ahead and create a new cluster or you can run bin/muchos wipe -c <existingclustername>
and re-run the setup as well. Let us know how it goes?
@karthick-rn @arvindshmicrosoft ok, wipe seems not wipe everything, only killing some part of cluster, in any way, my current config its' 3 master and 12 workers, it's test cluster, and what I need it's hdfs and accumulo HA mode, when I use
"hdfs_ha = True"
I all time get on zkfc init error "HA is not enabled for this namenode.", how to fix it? here is my current config
leader1 = namenode,resourcemanager,accumulomaster,zookeeper,zkfc,journalnode leader2 = metrics,zookeeper,resourcemanager,zkfc,journalnode leader3 = zookeeper,resourcemanager,zkfc,journalnode worker1 = worker,swarmmanager worker2 = worker worker3 = worker worker4 = worker worker5 = worker worker6 = worker worker7 = worker worker8 = worker worker9 = worker worker10 = worker worker11 = worker worker12 = worker
@Viv1986 If you're referring to the accumulo systemd units that were not wiped, I can see why and I'll fix this but other services should be removed successfully. Let us know if not?
In regards to the HA, you have configured only 1 namenode
- try setting as shown below. Ideally, you should have namenode
& zkfc
on the same hosts.
leader1 = namenode,resourcemanager,accumulomaster,zookeeper,journalnode,zkfc
leader2 = zookeeper,journalnode,namenode,zkfc,accumulomaster,resourcemanager
leader3 = journalnode,zookeeper
worker1 = worker
...
...
worker12 = worker
ok, working good, only bugs with tserver's start, they not starting after stop-starting accumulo through accumulo-cluster start or accumulo-cluster tserver-start, I start them direct thought /etc/systemd/system/accumulo-tserver@.service. Zookeeper on masters needs be started manually per master, should have central control like accumulo's. And last which left, do muchos have option for existing storage account to use it on hadoop accumulo?
Actually the systemd commands were added to accumulo-cluster
script as a convenience to handle start/stop of services in the cluster and was not part of the original source. As the problem is only with tserver restarts, may be there is something missing in the script, I'll look into that. For now, you can use something like sudo systemctl <start/stop> accumulo-tserver@1.service
to start/stop tserver. On the ZK, I understand - we are trying to minimise making changes to the original scripts as it adds overhead in maintaining it. Instead of manually ssh'ing into each node, you can do the below.
for host in leader1 leader2 leader3;
do
echo $host;
ssh $host 'sh -c "zkServer.sh start"'
done
do muchos have option for existing storage account to use it on hadoop accumulo?
If you're referring to the ADLS Gen2 storage account, then you'll have to update the required ADLS Gen2 fields in muchos.props
from the existing storage account and not leave them to default.
@karthick-rn it started 12 per instance again without reason, seems problem in scripts, NUM_TSERVERS=$(grep -E -c -v '(^#|^\s*$)' "$TSERVERS")
accumulo-cluster start Starting tablet servers ............... done accumulo-tserver@1.service loaded active running TServer Service for Accumulo accumulo-tserver@10.service loaded active running TServer Service for Accumulo accumulo-tserver@11.service loaded active running TServer Service for Accumulo accumulo-tserver@12.service loaded active running TServer Service for Accumulo accumulo-tserver@2.service loaded activating auto-restart TServer Service for Accumulo accumulo-tserver@3.service loaded active running TServer Service for Accumulo accumulo-tserver@4.service loaded active running TServer Service for Accumulo accumulo-tserver@5.service loaded active running TServer Service for Accumulo accumulo-tserver@6.service loaded active running TServer Service for Accumulo accumulo-tserver@7.service loaded active running TServer Service for Accumulo accumulo-tserver@8.service loaded active running TServer Service for Accumulo accumulo-tserver@9.service loaded active running TServer Service for Accumulo accumulo-tserver@1.service loaded active running TServer Service for Accumulo accumulo-tserver@10.service loaded active running TServer Service for Accumulo accumulo-tserver@11.service loaded active running TServer Service for Accumulo accumulo-tserver@12.service loaded active running TServer Service for Accumulo accumulo-tserver@2.service loaded active running TServer Service for Accumulo accumulo-tserver@3.service loaded active running TServer Service for Accumulo accumulo-tserver@4.service loaded active running TServer Service for Accumulo accumulo-tserver@5.service loaded active running TServer Service for Accumulo accumulo-tserver@6.service loaded active running TServer Service for Accumulo accumulo-tserver@7.service loaded active running TServer Service for Accumulo accumulo-tserver@8.service loaded active running TServer Service for Accumulo accumulo-tserver@9.service loaded active running TServer Service for Accumulo accumulo-tserver@1.service loaded active running TServer Service for Accumulo accumulo-tserver@10.service loaded active running TServer Service for Accumulo accumulo-tserver@11.service loaded active running TServer Service for Accumulo accumulo-tserver@12.service loaded active running TServer Service for Accumulo accumulo-tserver@2.service loaded active running TServer Service for Accumulo accumulo-tserver@3.service loaded active running TServer Service for Accumulo accumulo-tserver@4.service loaded active running TServer Service for Accumulo accumulo-tserver@5.service loaded active running TServer Service for Accumulo accumulo-tserver@6.service loaded active running TServer Service for Accumulo accumulo-tserver@7.service loaded active running TServer Service for Accumulo accumulo-tserver@8.service loaded active running TServer Service for Accumulo accumulo-tserver@9.service loaded active running TServer Service for Accumulo accumulo-tserver@1.service loaded active running TServer Service for Accumulo accumulo-tserver@10.service loaded active running TServer Service for Accumulo accumulo-tserver@11.service loaded active running TServer Service for Accumulo accumulo-tserver@12.service
and it's not hang them, only manual by sudo systemctl <start/stop> accumulo-tserver@1.service
[evoamsadm@worker12 ~]$ ps aux | grep tser evoamsa+ 23347 4.0 0.2 5934088 328632 ? Ssl 10:50 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver5_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tse ver evoamsa+ 23439 4.4 0.2 5935764 332940 ? Ssl 10:50 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver11_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tserver evoamsa+ 23513 4.5 0.2 5935952 330864 ? Ssl 10:50 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver3_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tse ver evoamsa+ 23603 5.1 0.2 5935672 327384 ? Ssl 10:50 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver12_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tserver evoamsa+ 23684 5.1 0.2 5934968 329224 ? Ssl 10:50 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver4_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tse ver evoamsa+ 23746 5.4 0.2 5933856 326084 ? Ssl 10:50 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver8_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tse ver evoamsa+ 23840 5.4 0.2 5935216 329120 ? Ssl 10:50 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver9_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tse ver evoamsa+ 23939 7.2 0.2 5933828 326200 ? Ssl 10:51 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver10_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tserver evoamsa+ 24049 10.4 0.2 5934296 328616 ? Ssl 10:51 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver2_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tse ver evoamsa+ 24128 12.5 0.2 5936716 330576 ? Ssl 10:51 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver7_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tse ver evoamsa+ 24202 12.7 0.2 5935424 329248 ? Ssl 10:51 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver6_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tse ver evoamsa+ 24525 42.8 0.2 5933188 327572 ? Ssl 10:52 0:05 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/evoamsadm/install/accumulo-2.0.1/lib/native -Xmx4G -Xms4G -Daccumulo.log.dir=/data1/logs/accumulo -Daccumulo.application=tserver1_worker12 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tse ver
NUM_TSERVERS=$(grep -E -c -v '(^#|^\s*$)' "$TSERVERS")
From which script you got this line? I don't see this line in the accumulo-cluster script at all. Also, I have a similar setup with systemd which has 1 tserver per worker (total -4) and don't see this problem.
[user1@host1 ~]$ accumulo-cluster start Starting tablet servers ....... done accumulo-tserver@1.service loaded active running TServer Service for Accumulo accumulo-tserver@1.service loaded active running TServer Service for Accumulo accumulo-tserver@1.service loaded active running TServer Service for Accumulo accumulo-tserver@1.service loaded active running TServer Service for Accumulo
[user1@host3 ~]$ ps aux | grep tserver user1+ 29585 6.8 1.0 3738804 349168 ? Ssl 11:36 0:04 /usr/lib/jvm/java/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Daccumulo.native.lib.path=/home/user1/install/accumulo-2.0.1/lib/native -Xmx2G -Xms2G -Daccumulo.log.dir=/var/data1/logs/accumulo -Daccumulo.application=tserver1_host3 -Dlog4j.configuration=log4j-service.properties org.apache.accumulo.start.Main tserver
install/accumulo-2.0.1/bin/accumulo-util
accumulo-util script is different, I don't think it is involved in the start/stop of services.
num_tservers initialization come from that file, so I think it's involved, cuz if I will correct it NUM_TSERVERS=1
it will fix problem
@karthick-rn @arvindshmicrosoft ok where I should put account key for storage account?
instance_volumes_input = abfss://xxxx-test@xxxxstorage.blob.core.windows.net instance_volumes_adls = adls_storage_type = Standard_LRS user_assigned_identity = azure_tenant_id = azure_client_id = principal_id =
instance_volumes_input = abfss://xxxx-test@xxxxstorage.blob.core.windows.net
The correct endpoint for ADLS Gen2 URI is dfs.core.windows.net
and not blob.core.windows.net
. If this is a blob storage account then I doubt it will work as Muchos only supports ADLS Gen2 currently.
ok, but it's closed by key, where I should put that key?
@karthick-rn @arvindshmicrosoft can you help me with access key? where I should put it?
@karthick-rn @arvindshmicrosoft can you help me with access key? where I should put it?
In Muchos, authentication for ADLS Gen2 is done via User Assigned Managed Identity. If you have access key, then currently we don't support it however you can create a user assigned managed identity, add it to the storage account and assign the Storage Blob Data Owner
role(1) & update the value in user_assigned_identity
found in muchos.props
file. The launch step will then take care of adding the identity to all the hosts in the VM scalesets. The below link will help you in the creation of managed identity.
(1) - https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-manage-ua-identity-portal
@karthick-rn actually it works if I do by https://accumulo.apache.org/blog/2019/10/15/accumulo-adlsgen2-notes.html and https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html#Default:_Shared_Key
@Viv1986 Good you got it working. The fact I was emphasising is more from a muchos standpoint & currently what it supports.
@Viv1986 - would you mind if we close this; it looks like you were able to address the original issue in this thread. Correct?
yeap
Hello,
could you specify how setup config files for existing cluster? where I should specify ips and their roles
eg cluster_type = existing