apache / accumulo-testing

Apache Accumulo Testing
https://accumulo.apache.org
Apache License 2.0
15 stars 40 forks source link

Add Compactors and ScanServers to agitator #234

Closed dlmarion closed 2 years ago

dlmarion commented 2 years ago

FWIW, it does not look like the agitator supports multiple tservers / sservers per host.

dlmarion commented 2 years ago

So, this does not work on CentOS Linux release 7.9.2009 (Core) with bash version GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu). Running agitator start yields:

./agitator: line 207: readarray: -d: invalid option
readarray: usage: readarray [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
./agitator: line 215: readarray: -d: invalid option
readarray: usage: readarray [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]

I copied the readarray command from line 44 ...

DomGarguilo commented 2 years ago

When running ./bin/agitator start I see the following:

Using /home/dgarguilo/github/accumulo-testing/conf/env.sh for setup
Reading cluster config from /home/dgarguilo/github/fluo-uno/install/accumulo-2.1.0-SNAPSHOT/conf/cluster.yaml
Starting manager and tserver agitation as dgarguilo
Running datanode agitator as dgarguilo

which does not properly print that agitation has started for the sservers or compactors. However I do see the log files for them that indicate that they have started.

DomGarguilo commented 2 years ago

The readme should be updated to indicate that these processes are included. As of now it reads

The agitator will periodically kill the Accumulo manager, tablet server, and Hadoop data node processes on random nodes

dlmarion commented 2 years ago

@DomGarguilo - which OS and version are you using? Which version of bash are you using?

DomGarguilo commented 2 years ago

@DomGarguilo - which OS and version are you using? Which version of bash are you using?

GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu) Pop!_OS 22.04 LTS

DomGarguilo commented 2 years ago

I ran the agitator on this branch and it doesn't look like the compactors or sservers were killed. According to the monitor the compactors were still up.

It might have to do with how the names of these services are gathered/used as they don't seem to print in the logs as nicely as the other:

Compactor agitator logs ``` 20221006 15:10:56 Starting compactor agitation. Kill every 20 minutes, restart every 10 minutes. 20221006 15:10:56 Will randomly kill between 1 and 1 of the following: COMPACTOR_HOSTS_q1 COMPACTOR_HOSTS_q2 20221006 15:10:56 Sleeping for 20 minutes 20221006 15:30:56 Killing compactor at COMPACTOR_HOSTS_q1 COMPACTOR_HOSTS_q2 ssh: Could not resolve hostname compactor_hosts_q1 compactor_hosts_q2 : Name or service not known 20221006 15:30:56 Sleeping for 10 minutes. 20221006 15:40:56 Restarting compactor at COMPACTOR_HOSTS_q1 COMPACTOR_HOSTS_q2 ssh: Could not resolve hostname compactor_hosts_q1 compactor_hosts_q2 : Name or service not known 20221006 15:40:56 Sleeping for 20 minutes 20221006 16:00:56 Killing compactor at COMPACTOR_HOSTS_q1 COMPACTOR_HOSTS_q2 ssh: Could not resolve hostname compactor_hosts_q1 compactor_hosts_q2 : Name or service not known 20221006 16:00:56 Sleeping for 10 minutes. 20221006 16:10:57 Restarting compactor at COMPACTOR_HOSTS_q1 COMPACTOR_HOSTS_q2 ssh: Could not resolve hostname compactor_hosts_q1 compactor_hosts_q2 : Name or service not known 20221006 16:10:57 Sleeping for 20 minutes ```
sserver agitator logs ``` 20221006 15:10:56 Starting sserver agitation. Kill every 20 minutes, restart every 10 minutes. 20221006 15:10:56 Will randomly kill between 1 and 1 of the following: SSERVER_HOSTS_default 20221006 15:10:56 Sleeping for 20 minutes 20221006 15:30:56 Killing sserver at SSERVER_HOSTS_default ssh: Could not resolve hostname sserver_hosts_default : Name or service not known 20221006 15:30:56 Sleeping for 10 minutes. 20221006 15:40:56 Restarting sserver at SSERVER_HOSTS_default ssh: Could not resolve hostname sserver_hosts_default : Name or service not known 20221006 15:40:56 Sleeping for 20 minutes 20221006 16:00:56 Killing sserver at SSERVER_HOSTS_default ssh: Could not resolve hostname sserver_hosts_default : Name or service not known 20221006 16:00:56 Sleeping for 10 minutes. 20221006 16:10:57 Restarting sserver at SSERVER_HOSTS_default ssh: Could not resolve hostname sserver_hosts_default : Name or service not known 20221006 16:10:57 Sleeping for 20 minutes ```

the following was the portion from the cluster.yaml file used to start these services:

sserver:
  - default:
    - localhost

compaction:
  coordinator:
    - localhost
  compactor:
    - q1:
        - localhost
    - q2:
        - localhost
dlmarion commented 2 years ago

I have a fix for that, will push it in the morning

dlmarion commented 2 years ago

f5b0c83 includes @ctubbsii changes in #238 . Seems to parse the configuration correctly, requires a little more testing with multiple hosts

dlmarion commented 2 years ago

After looking at this more, I think we should punt on adding these to the agitator. The agitator doesn't handle multiple servers on the same host, which isn't a deal breaker. But, it also doesn't handle passing arguments to the start command. For example, the compactor needs the -q <queueName> argument when starting the process and the agitator doesn't currently support that.