Closed ivakegg closed 1 year ago
@ivakegg If using a snapshot, can you provide the commit used? If not sure, the built jars probably have the commit in their manifest.
Looks like this is the loop that it's stuck on waiting on for the scans to finish. And the log message is being printed in ThriftScanner unfortunately isn't logging the full stack trace.
@ivakegg - It seems like the message portion of the exception is null. Since you are using a SNAPSHOT build (and if you can reproduce the issue) I would update the logging here to log the full stack trace and try and log the TException being caught.
@cshannon The interrupted exception on line https://github.com/apache/accumulo/blob/b41427dc18c9fa36c9e619ebf928a02aebeb22fb/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java#LL987C33-L987C33 is also poorly formatted. It should be log.error("{}", e, e);
instead of log.error(e.toString());
Regardless of what is discovered from troubleshooting this, that logging should be fixed.
There may be cases where we omit the full stack trace for some rational reason, but those cases should have comments in the code explaining the justification.
I'm marking this as a blocker to fix the logging for 2.1.1 at a minimum.
Good catch on the logging issue, I will at least fix that now.
With the updated logging, I believe we can close this issue for now pending reproducing this error. I have only seen it once thus far and we have been bounced the system several times since.
We appear to be able to reproduce this fairly easily on one of our test systems. Reopening and will add details.
Can you confirm that the same host name / fqdn is used consistently in all places?
@EdColeman Yes, we will check.
EDIT: The nodes are using a short-hostname. There is evidence in the cluster that FQDN hostnames have been used on t-servers in the past two weeks - however, all currently are using the short hostnames.
When attempting to stop a single t-server (i.e. bin/accumulo-cluster stop-here
) on a test system (Cluster-1) the command-line stop hung and the t-server continued to receive assignments. The monitor showed the t-server assignment number cycle up/down. From the logs, the manager appeared to assign tablets back to the t-server being stopped. We attempted to reproduce the issue on a second test cluster (Cluster-2) and the t-server shutdown as expected. However, we were not sure if the assignment (Cluster-2) which occurs after the t-server shutdown is expected or un-expected (expanded below).
For the purposes of below, Cluster-1 will be describing the t-servers with the shutdown issue and Cluster-2 will be the other cluster.
We currently have noticed two configuration differences between systems:
The following was observed on Cluster-1 (t-server hanging during shutdown):
- Seeding FATE[...] Shutdown tserver <shutdown-host:port> [...]
- Tablet Server shutdown requested for <shutdown-host:pot> [...]
- tablet <tid;begin;end> was unloaded from <shutdown-host:port> [...]
- tablet ... was unloaded on <shutdown-host:port> [...]
- tablet ... was unloaded on <shutdown-host:port> [...]
- tablet ... was unloaded on <shutdown-host:port> [...]
- ...
- Sending 1 tablets to balancer for table accumulo.metadata for assignment within t-servers [..., <shutdown-host:port>, ...]
- Assigning 1 tablets
- Assigned !0,~del... to <shutdown-host:port> [....]
- tablet ... was unloaded on <shutdown-host:port> [...]
- tablet ... was unloaded on <shutdown-host:port> [...]
- tablet ... was unloaded on <shutdown-host:port> [...]
- tablet <!0,~del> was loaded on <shutdown-host:port> [...]
- tablet ... was unloaded on <shutdown-host:port> [...]
- Sending 14 tablets to balancer for table <application-table> for assignment within t-servers [..., <shutdown-host:port>, ...]
- ...
- Assigning XXXX tablets
- ...
The Cluster-2 logs reflect the following:
- tablet ... was unloaded on <shutdown-host:port> [...]
- ...
- tablet ... was unloaded on <shutdown-host:port> [...]
- tablet server hosts no tablets <shutdown-host:port> [...]
- unable to get tablet server status <shutdown-host:port> [...]
- IOException: No connection to <shutdown-host:port> [...]
- not balancing because the balance information is out of date
- not balancing just yet, as collection of live tservers is in flux
- Sending X tablets to balancer for table <user-table> for assignment within t-servers [..., <shutdown-host:port>, ...]
- Assigned <user-tablet-1> to <shutdown-host:port>
- ...
- Assigned <user-tablet-n> to <shutdown-host:port> [...]
- Could not connect to server <shutdown-host:port> [...]
- ....
- Could not connect to server <shutdown-host:port> [...]
- [Normal Tablets]: 10237 tablets are SUSPENDED
- Detected change in current tserver set, re-running state machine.
- not balancing because there are unhosted tablets: 10237
- 9597 assigned to dead servers [<user-tablet-1@(Location [server=<shutdown-host:port>, type=FUTURE],null,Location[server=<shutdown-host:port>,type=LAST]),.....
- Suspended <user-tablet-1> to <shutdown-host:port> at <> with 1 walogs
- Sending X tablets to balancer for <user-table> for assignment within tservers [...]
- balancer assigned <user-tablet-> to <host-a:port> which is not the suggested location of <shutdown-host:port>
The Cluster-2 logs above show the t-server shutdown after the tablet unload. However, we were not sure if the assignment after the unload is expected or un-expected (and would be similar to Cluster-1). The assigned tablets are sent to the shutdown t-server (and the list of candidate t-servers contains the shutdown t-server). The manager observes the server is unreachable and after tablet suspension, the candidate t-server in the next assignment list no longer includes the shutdown t-server.
Attempted to replicate with an integration test using MiniAccumulo cluster. Additionally reviewed source related to tablet assignment after t-server shutdown initiated. There may be a change between 1.10
and 2.1
in the TabletGroupWatcher
that impacted the t-servers sent to the balancer for assignment.
The list of servers being shutdown is used by the TabletGroupWatcher
to create a list of destinations
:
https://github.com/apache/accumulo/blob/2.1/server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java#L159
The flushChanges()
method attempts to get assignments from the balancer. The t-server set is the current list (which includes the list of t-servers being shutdown).
https://github.com/apache/accumulo/blob/2.1/server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java#L910
In previous versions (i.e. v1.10) the list passed to flushChanges()
appears to have been the destinations
which may have excluded the t-servers being shutdown.
https://github.com/apache/accumulo/blob/1.10/server/master/src/main/java/org/apache/accumulo/master/TabletGroupWatcher.java#L214
https://github.com/apache/accumulo/blob/1.10/server/master/src/main/java/org/apache/accumulo/master/TabletGroupWatcher.java#L361
https://github.com/apache/accumulo/blob/1.10/server/master/src/main/java/org/apache/accumulo/master/TabletGroupWatcher.java#L875
Continuing to investigate.
EDIT: The nodes are using a short-hostname. There is evidence in the cluster that FQDN hostnames have been used on t-servers in the past two weeks - however, all currently are using the short hostnames.
Did you also look in zookeeper?
EDIT: The nodes are using a short-hostname. There is evidence in the cluster that FQDN hostnames have been used on t-servers in the past two weeks - however, all currently are using the short hostnames.
Did you also look in zookeeper?
I had not previously looked in zookeeper. I checked and zookeeper reports short-hostnames, with one exception. The replication/tservers location references FQDN names.
So, it sounds like the Manager is re-assigning the metadata table tablets immediately, even to a TabletServer that is shutting down. What is manager.metadata.suspendable
set to?
I'm going to bump this to 2.1.2, in anticipation of a 2.1.1 soon. I don't think this needs to be a blocker anymore. But if we get a change ready to address this before 2.1.1 release candidates are created, we can include it in 2.1.1 instead of pushing off until 2.1.2.
So, it sounds like the Manager is re-assigning the metadata table tablets immediately, even to a TabletServer that is shutting down. What is
manager.metadata.suspendable
set to?
The value is set to "false" on systems we are using for testing.
We have seen this issue again. Don't know if it is the same reason. These are tservers that are exclusively handling accumulo.metadata tablets. They were stuck spamming the following messages:
SessionManager INFO : Closing idle session from user=!SYSTEM, client=<manager node>
Tablet DEBUG: Waiting to completeClose for ..., 0 writes 1 scans
This was with ACCUMULO 2.1.1-RC2
We have jstacks and logs captured for anybody that wants to come by and look (test system, manager node)
A change was already released to address this in 2.1.1 (what was 2.1.1-RC2). Rather than introduce a bunch of confusion over the target/fix versions, I'm (re-)closing this, and creating a new issue, #3512
Doing a cluster shutdown, there was one tserver that would not complete. Its log was spamming the following: [tablet.Tablet] DEBUG: Waiting to completed Close for !0;.... , 0 writes 21 scans [clientImp.ThriftScanner] DEBUG: Error getting transport to node:9997 : null
This was simply repeating, and the
<node>
in the error message was itself!Accumulo 2.1.1-SNAPSHOT CentOS 7.3