Open dangelcz opened 6 years ago
I am suspicious of that problem occures when there is some connection problem like ip unreachable, timeout or unable to login. Thats where some inner exception occures and after that, close method does not work properly and threads stuck on reading from empty stream.
I'm trying to work this out but am having a hard time reproducing the error.
I think possible causes should be a form of timeout after a successful login, or a command that fails in an unexpected way (and basically 'hangs' the connection until it times out).
Questions:
Can you please provide me with the following:
cmd
in your example above?Hi, sorry for long reply, I am taking vacation at the moment. With VisualVM I tried to take threads snapshot at the end of execution. There are methods where Processor and Readers stucks (hoping that I understand it well).
Processors
Readers
Answer for your question
Commands, that are being used in current job are /system/resource/print /system/routerboard/print
Exception messages that occures at line 68
Stack traces:
me.legrange.mikrotik.MikrotikApiException: cannot log in
at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.getResults(ApiConnectionImpl.java:472)
at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.access$100(ApiConnectionImpl.java:426)
at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:110)
at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:68)
at me.legrange.mikrotik.impl.ApiConnectionImpl.login(ApiConnectionImpl.java:63)
at libs.mikrotik.MikroticApi.prepareConnection(MikroticApi.java:85)
at libs.mikrotik.MikroticApi.commandBatch(MikroticApi.java:64)
at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:27)
at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:33)
at main.logic.JobExecutor.loadDevice(JobExecutor.java:66)
at execution.ScanUserJobExecutor.scanDevice(ScanUserJobExecutor.java:74)
at execution.ScanUserJobExecutor.executeJob(ScanUserJobExecutor.java:98)
at execution.JobTypes.executeJob(JobTypes.java:28)
at main.logic.JobExecutor.executeJob(JobExecutor.java:55)
at main.Main.executeJob(Main.java:144)
at main.MainThreadPool$ThreadWorker.run(MainThreadPool.java:135)
me.legrange.mikrotik.ApiConnectionException: Remote host closed connection during handshake
at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:122)
at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:109)
at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:68)
at me.legrange.mikrotik.impl.ApiConnectionImpl.login(ApiConnectionImpl.java:58)
at libs.mikrotik.MikroticApi.prepareConnection(MikroticApi.java:85)
at libs.mikrotik.MikroticApi.commandBatch(MikroticApi.java:64)
at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:27)
at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:33)
at main.logic.JobExecutor.loadDevice(JobExecutor.java:66)
at execution.ScanUserJobExecutor.scanDevice(ScanUserJobExecutor.java:74)
at execution.ScanUserJobExecutor.executeJob(ScanUserJobExecutor.java:98)
at execution.JobTypes.executeJob(JobTypes.java:28)
at main.logic.JobExecutor.executeJob(JobExecutor.java:55)
at main.Main.executeJob(Main.java:144)
at main.MainThreadPool$ThreadWorker.run(MainThreadPool.java:135)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source)
at sun.security.ssl.SSLSocketImpl.writeRecord(Unknown Source)
at sun.security.ssl.AppOutputStream.write(Unknown Source)
at sun.security.ssl.AppOutputStream.write(Unknown Source)
at java.io.DataOutputStream.write(Unknown Source)
at me.legrange.mikrotik.impl.Util.encode(Util.java:140)
at me.legrange.mikrotik.impl.Util.write(Util.java:25)
at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:118)
... 14 more
Caused by: java.io.EOFException: SSL peer shut down incorrectly
at sun.security.ssl.InputRecord.read(Unknown Source)
... 23 more
me.legrange.mikrotik.MikrotikApiException: Command timed out after 10000 ms
at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.getResults(ApiConnectionImpl.java:472)
at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.access$100(ApiConnectionImpl.java:426)
at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:110)
at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:68)
at me.legrange.mikrotik.impl.ApiConnectionImpl.login(ApiConnectionImpl.java:63)
at libs.mikrotik.MikroticApi.prepareConnection(MikroticApi.java:85)
at libs.mikrotik.MikroticApi.commandBatch(MikroticApi.java:64)
at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:27)
at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:33)
at main.logic.JobExecutor.loadDevice(JobExecutor.java:66)
at execution.ScanUserJobExecutor.scanDevice(ScanUserJobExecutor.java:74)
at execution.ScanUserJobExecutor.executeJob(ScanUserJobExecutor.java:98)
at execution.JobTypes.executeJob(JobTypes.java:28)
at main.logic.JobExecutor.executeJob(JobExecutor.java:55)
at main.Main.executeJob(Main.java:144)
at main.MainThreadPool$ThreadWorker.run(MainThreadPool.java:135)
Caused by: me.legrange.mikrotik.ApiConnectionException: Command timed out after 10000 ms
at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.getResults(ApiConnectionImpl.java:464)
... 15 more
I am using 8 threads that takes list of ips and for each one they call this method (with some additional logic of loading and saving Device object from database).
public static void loadDevice(Device device, String ip, String login, String pwd)
{
String rscCmd = MikrotikCommands.resources();
Map<String, String> response = MikroticApi.processCmd(rscCmd, ip, login, pwd);
device.ros_version = response.get("version");
device.architecture = response.get("architecture-name");
String rtbCmd = MikrotikCommands.routerboard();
response = MikroticApi.processCmd(rtbCmd, ip, login, pwd);
device.current_firmware = response.get("current-firmware");
device.serial_number = response.get("serial-number");
device.upgrade_firmware = response.get("upgrade-firmware");
}
I’ve been super busy at work so I have not been able to look at this.
I will try to find some time on the weekend.
Gideon
On 20 Jul 2018, at 16:54, dangelcz notifications@github.com wrote:
I am using 8 threads that takes list of ips and for each one they call this method (with some additional logic of loading and saving Device object from database).
public static void loadDevice(Device device, String ip, String login, String pwd) { String rscCmd = MikrotikCommands.resources(); Map<String, String> response = MikroticApi.processCmd(rscCmd, ip, login, pwd);
device.ros_version = response.get("version"); device.architecture = response.get("architecture-name");
String rtbCmd = MikrotikCommands.routerboard(); response = MikroticApi.processCmd(rtbCmd, ip, login, pwd);
device.current_firmware = response.get("current-firmware"); device.serial_number = response.get("serial-number"); device.upgrade_firmware = response.get("upgrade-firmware"); } — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/GideonLeGrange/mikrotik-java/issues/55#issuecomment-406625397, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFCFMiLrYUyT5aKuzKwvmCwUQLc19NUks5uIe8TgaJpZM4VPkh2.
I know what is wrong and will fix it. Not an easy fix though, may require some internal re-write.
Hi, I managed to repair problems myself at the weekend. Its working well, but it looks more like hack than nice architecture repair. I’m still busy and have no time to tell you full details. I’ll try to create pull request tommorow so you can check my changes.
So, main problem seems to be, that Reader thread stucks on in.read() in Util.decode() method and later becomes zombie thread, which cannot be killed or stopped. I still don’t know how is it possible, because in Connection.close() I addeed reader.join() which goes through without any problems, so reader should have ended, but in some cases don't. This behaviour causes two next problems:
Main changes I made:
reader.join(); reader = null;
There are some more changes, but those are mentionable. Maybe, you' ll find some clue for your analysis and patch.
You are correct, the problem is that in.read()
blocks. What really should happen is that in.read()
should fail.
I've committed a very simple patch that sets the soTimeout
on the socket for the TCP connection. This should have been done, but it wasn't, so the setTimeout()
call on the API was not as effective as it could have been.
Can you please try the following:
issue-55
ApiConnection.setTimeout()
call to your code. I suggest a short but sensible timeout, something like 20000 msMaybe it helps you. If it works better, I will push that fix to master.
Gideon
Hi, I made some tests with this branch, set timeout to 20 seconds, but there is no difference. Threads stuck at the same places as before even if I wait more than 2 minutes after main procedure completes (I removed setDeamon(), so VM won't cancel)
I think, that reading stucks due to half closed socket, which should throw exception, but won't. Consider, whether using Reader, Processor and SyncListener as inner classes cannot cause problems with theirs specific tight references. Meaning, they prevent to be harvested by GC somehow. When I am debugging stucked Reader, it shows me reference to ApiConnectionImpl.connected equals true even close method was called some time ago, which does not make sense.
Thread count graph
Stack traces (15 seconds because I took image early, but it contined)
I was not able to prepare pull request, because applying changes without my refactor did not work well. So I uploaded whole project as is to googledrive at least, if you want to see. This version works well for me. https://drive.google.com/open?id=1gbfV2bkbYwXZbv1pldQRiIiT_I1dgr8j
that is the best way
class Connection extends AsyncTask<String,Integer,ApiConnection>{ protected ApiConnection doInBackgeound(String... strings){ ApiConnection apiConnection; try{ apiConnection = ApiConnection.connect(strings[0]; apiConnection.login(strings[1],strings[2]); return apiConnection; }catch(Exception e){} return null; }}
on MainActivity
Connection conn = new Connection(); conn.execute("IP","ADMIN","PASSWORD");
to get ApiConnection you will do (get()) after )
guards
for all you can dependecy sy.iyad.MikroSy:mikrosy:1.7.24
Hi, first of all I want to thank you for this great api implementation, I really appreciate it.
I'm using your api for managing devices over small city network (about 1500 devices). For that purpouse, I wrote small java program, that provides batch changes. Problem I got is about closing/terminating your Reader and Processor threads. After running one of my jobs, which checks configuration of 739 devices, there are 216 threads alive just before program ends, which is problem, because my monitoring server runs out of resources. Here you can see screen captured from VisualVM on my development PC For sending commands through your api, I am using following code
I am suspicious of that problem occures when there is some connection problem like ip unreachable, timeout or unable to login. Thats where some inner exception occures and after that, close method does not work properly and threads stuck on reading from empty stream.
Thanks in advance for your help.