Open obi11235 opened 7 years ago
This is not a known issue, however:
panic: runtime error: invalid memory address or nil pointer dereference
is likely caused by conn
being nil
in https://github.com/github/gh-ost/blob/3313cb3933933f4a1c7f20ca62f1084f9f2184ad/go/logic/server.go#L101
I will fix that, and there will be no crash. But that won't solve the fact you have too many open files
. I wonder in what manner you query gh-ost
via the socket file. From the log see 24
entries for "too many open files" at the same second. Is that the rate at which you query the socket? Perhaps your client does not close the connection?
I was querying the socket by hand. This particular alter i performed a different way ultimately but i have more of this size to do in the future. Is there any chance Gh-ost is not closing connections correctly when it get's disconnected from the slaves it is monitoring?
Next time i'm running a big like this on a huge table i'll monitor the open files by gh-ost to see if it yields any useful information.
Is there any chance Gh-ost is not closing connections correctly when it get's disconnected from the slaves it is monitoring?
Connections to replicas are handled by the go-mysql-driver
, underlying the database/sql
interface. Connections are actually routinely reused, and are unlikely to timeout, since they really query the replicas frequently.
What you're having is a too many open files
on the replica where gh-ost
runs.
I'm not sure I understand how you're running these by hand and getting 24
responses within the same second. Are there otherwise too many open files on your system?
The system was performing correctly at the time, the only problem was talking to gh-ost. I think the socket just caused the crash, it was having an issue connecting to the master prior to me trying to work with the socket: 2017-04-10 02:51:50 ERROR dial tcp 10.30.2.10:3306: socket: too many open files
The command i issued to the socket was to throttle it since it was having an issue hoping that would help it recover. I wonder if it hit a per process opened file limit.
I have started another long alter that i'm monitoring the open files for the gh-ost process, and it looks like every time i get a message about an invalid connection: [MySQL] 2017/04/21 18:24:19 statement.go:27: Invalid Connection
the number of open files spikes for a while, for this operation i'm not monitoring any slaves. I wonder if any of the slaves have an issue that are causing the Invalid Connections if it causes a large spike in the open files for the process as it retries to connect opening more connections
On a long running alter using the below setup after 20+ hours it get's an error with too many open files and if i try to query gh-ost via it's socket it will crash. Is this a known issue? I'm using the pre compiled version 1.0.36.
./gh-ost --max-load=Threads_running=50,Threads_connected=700 --critical-load=Threads_running=3000,Threads_connected=5000 --chunk-size=1000 --throttle-control-replicas="10.20.2.4,10.30.2.12:3307,10.30.2.13,10.30.2.2,10.30.2.7,127.0.0.1:9901,127.0.0.1:9904,127.0.0.1:9905" --max-lag-millis=20000 --conf=admin.my --host=10.30.2.10 --database="database" --table="guest" --verbose --alter="engine=innodb" --cut-over=default --exact-rowcount --concurrent-rowcount --default-retries=240 --panic-flag-file=/tmp/gh-ost.panic.flag --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --execute