GideonLeGrange / mikrotik-java

A Java client implementation for the Mikrotik RouterOS API.
Apache License 2.0
180 stars 100 forks source link

Reader + Processor threads #55

Open dangelcz opened 6 years ago

dangelcz commented 6 years ago

Hi, first of all I want to thank you for this great api implementation, I really appreciate it.

I'm using your api for managing devices over small city network (about 1500 devices). For that purpouse, I wrote small java program, that provides batch changes. Problem I got is about closing/terminating your Reader and Processor threads. After running one of my jobs, which checks configuration of 739 devices, there are 216 threads alive just before program ends, which is problem, because my monitoring server runs out of resources. Here you can see screen captured from VisualVM on my development PC image For sending commands through your api, I am using following code image

I am suspicious of that problem occures when there is some connection problem like ip unreachable, timeout or unable to login. Thats where some inner exception occures and after that, close method does not work properly and threads stuck on reading from empty stream.

Thanks in advance for your help.

GideonLeGrange commented 6 years ago

I am suspicious of that problem occures when there is some connection problem like ip unreachable, timeout or unable to login. Thats where some inner exception occures and after that, close method does not work properly and threads stuck on reading from empty stream.

I'm trying to work this out but am having a hard time reproducing the error.

I think possible causes should be a form of timeout after a successful login, or a command that fails in an unexpected way (and basically 'hangs' the connection until it times out).

Questions:

Can you please provide me with the following:

dangelcz commented 6 years ago

Hi, sorry for long reply, I am taking vacation at the moment. With VisualVM I tried to take threads snapshot at the end of execution. There are methods where Processor and Readers stucks (hoping that I understand it well).

Processors processor

Readers reader

Answer for your question

Commands, that are being used in current job are /system/resource/print /system/routerboard/print

Exception messages that occures at line 68

Stack traces:

me.legrange.mikrotik.MikrotikApiException: cannot log in
    at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.getResults(ApiConnectionImpl.java:472)
    at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.access$100(ApiConnectionImpl.java:426)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:110)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:68)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.login(ApiConnectionImpl.java:63)
    at libs.mikrotik.MikroticApi.prepareConnection(MikroticApi.java:85)
    at libs.mikrotik.MikroticApi.commandBatch(MikroticApi.java:64)
    at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:27)
    at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:33)
    at main.logic.JobExecutor.loadDevice(JobExecutor.java:66)
    at execution.ScanUserJobExecutor.scanDevice(ScanUserJobExecutor.java:74)
    at execution.ScanUserJobExecutor.executeJob(ScanUserJobExecutor.java:98)
    at execution.JobTypes.executeJob(JobTypes.java:28)
    at main.logic.JobExecutor.executeJob(JobExecutor.java:55)
    at main.Main.executeJob(Main.java:144)
    at main.MainThreadPool$ThreadWorker.run(MainThreadPool.java:135)

me.legrange.mikrotik.ApiConnectionException: Remote host closed connection during handshake
    at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:122)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:109)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:68)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.login(ApiConnectionImpl.java:58)
    at libs.mikrotik.MikroticApi.prepareConnection(MikroticApi.java:85)
    at libs.mikrotik.MikroticApi.commandBatch(MikroticApi.java:64)
    at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:27)
    at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:33)
    at main.logic.JobExecutor.loadDevice(JobExecutor.java:66)
    at execution.ScanUserJobExecutor.scanDevice(ScanUserJobExecutor.java:74)
    at execution.ScanUserJobExecutor.executeJob(ScanUserJobExecutor.java:98)
    at execution.JobTypes.executeJob(JobTypes.java:28)
    at main.logic.JobExecutor.executeJob(JobExecutor.java:55)
    at main.Main.executeJob(Main.java:144)
    at main.MainThreadPool$ThreadWorker.run(MainThreadPool.java:135)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
    at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
    at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source)
    at sun.security.ssl.SSLSocketImpl.writeRecord(Unknown Source)
    at sun.security.ssl.AppOutputStream.write(Unknown Source)
    at sun.security.ssl.AppOutputStream.write(Unknown Source)
    at java.io.DataOutputStream.write(Unknown Source)
    at me.legrange.mikrotik.impl.Util.encode(Util.java:140)
    at me.legrange.mikrotik.impl.Util.write(Util.java:25)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:118)
    ... 14 more
Caused by: java.io.EOFException: SSL peer shut down incorrectly
    at sun.security.ssl.InputRecord.read(Unknown Source)
    ... 23 more

me.legrange.mikrotik.MikrotikApiException: Command timed out after 10000 ms
    at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.getResults(ApiConnectionImpl.java:472)
    at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.access$100(ApiConnectionImpl.java:426)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:110)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.execute(ApiConnectionImpl.java:68)
    at me.legrange.mikrotik.impl.ApiConnectionImpl.login(ApiConnectionImpl.java:63)
    at libs.mikrotik.MikroticApi.prepareConnection(MikroticApi.java:85)
    at libs.mikrotik.MikroticApi.commandBatch(MikroticApi.java:64)
    at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:27)
    at libs.mikrotik.MikroticApi.processCmd(MikroticApi.java:33)
    at main.logic.JobExecutor.loadDevice(JobExecutor.java:66)
    at execution.ScanUserJobExecutor.scanDevice(ScanUserJobExecutor.java:74)
    at execution.ScanUserJobExecutor.executeJob(ScanUserJobExecutor.java:98)
    at execution.JobTypes.executeJob(JobTypes.java:28)
    at main.logic.JobExecutor.executeJob(JobExecutor.java:55)
    at main.Main.executeJob(Main.java:144)
    at main.MainThreadPool$ThreadWorker.run(MainThreadPool.java:135)
Caused by: me.legrange.mikrotik.ApiConnectionException: Command timed out after 10000 ms
    at me.legrange.mikrotik.impl.ApiConnectionImpl$SyncListener.getResults(ApiConnectionImpl.java:464)
    ... 15 more
dangelcz commented 6 years ago

I am using 8 threads that takes list of ips and for each one they call this method (with some additional logic of loading and saving Device object from database).

public static void loadDevice(Device device, String ip, String login, String pwd)
{
    String rscCmd = MikrotikCommands.resources();
    Map<String, String> response = MikroticApi.processCmd(rscCmd, ip, login, pwd);

    device.ros_version = response.get("version");
    device.architecture = response.get("architecture-name");

    String rtbCmd = MikrotikCommands.routerboard();
    response = MikroticApi.processCmd(rtbCmd, ip, login, pwd);

    device.current_firmware = response.get("current-firmware");
    device.serial_number = response.get("serial-number");
    device.upgrade_firmware = response.get("upgrade-firmware");
}
GideonLeGrange commented 6 years ago

I’ve been super busy at work so I have not been able to look at this.

I will try to find some time on the weekend.

Gideon

On 20 Jul 2018, at 16:54, dangelcz notifications@github.com wrote:

I am using 8 threads that takes list of ips and for each one they call this method (with some additional logic of loading and saving Device object from database).

public static void loadDevice(Device device, String ip, String login, String pwd) { String rscCmd = MikrotikCommands.resources(); Map<String, String> response = MikroticApi.processCmd(rscCmd, ip, login, pwd);

device.ros_version = response.get("version"); device.architecture = response.get("architecture-name");

String rtbCmd = MikrotikCommands.routerboard(); response = MikroticApi.processCmd(rtbCmd, ip, login, pwd);

device.current_firmware = response.get("current-firmware"); device.serial_number = response.get("serial-number"); device.upgrade_firmware = response.get("upgrade-firmware"); } — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/GideonLeGrange/mikrotik-java/issues/55#issuecomment-406625397, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFCFMiLrYUyT5aKuzKwvmCwUQLc19NUks5uIe8TgaJpZM4VPkh2.

GideonLeGrange commented 6 years ago

I know what is wrong and will fix it. Not an easy fix though, may require some internal re-write.

dangelcz commented 6 years ago

Hi, I managed to repair problems myself at the weekend. Its working well, but it looks more like hack than nice architecture repair. I’m still busy and have no time to tell you full details. I’ll try to create pull request tommorow so you can check my changes.

So, main problem seems to be, that Reader thread stucks on in.read() in Util.decode() method and later becomes zombie thread, which cannot be killed or stopped. I still don’t know how is it possible, because in Connection.close() I addeed reader.join() which goes through without any problems, so reader should have ended, but in some cases don't. This behaviour causes two next problems:

  1. eternal loop in Processor.unpack() which returns empty string that throws Exception which is being caught and unpack is called again in parent while cycle.
  2. stucking Processor on take method of blocking queue (because Reader wont put there data anymore)

Main changes I made:

  1. Merged processor and reader to be executed in one thread
  2. With that, I was able to replace blocking queue by standard linked list, because there is no need to block now.
  3. In reader after decode added condition to end, if received string has zero length.
  4. Last one is a bit experimenting and maybe not all changes are nessesary. In Connection.close(), replaced thread.interrupt by join, closed streams and socket separately (to be sure, that are really closed) and after that assigned them to null, to be sure, that there is no reference so garbage collector can clear them. Same made with reader reference reader.join(); reader = null;

There are some more changes, but those are mentionable. Maybe, you' ll find some clue for your analysis and patch.

GideonLeGrange commented 6 years ago

You are correct, the problem is that in.read() blocks. What really should happen is that in.read() should fail.

I've committed a very simple patch that sets the soTimeout on the socket for the TCP connection. This should have been done, but it wasn't, so the setTimeout() call on the API was not as effective as it could have been.

Can you please try the following:

Maybe it helps you. If it works better, I will push that fix to master.

Gideon

dangelcz commented 6 years ago

Hi, I made some tests with this branch, set timeout to 20 seconds, but there is no difference. Threads stuck at the same places as before even if I wait more than 2 minutes after main procedure completes (I removed setDeamon(), so VM won't cancel)

I think, that reading stucks due to half closed socket, which should throw exception, but won't. Consider, whether using Reader, Processor and SyncListener as inner classes cannot cause problems with theirs specific tight references. Meaning, they prevent to be harvested by GC somehow. When I am debugging stucked Reader, it shows me reference to ApiConnectionImpl.connected equals true even close method was called some time ago, which does not make sense.

Thread count graph image

Stack traces (15 seconds because I took image early, but it contined) image

I was not able to prepare pull request, because applying changes without my refactor did not work well. So I uploaded whole project as is to googledrive at least, if you want to see. This version works well for me. https://drive.google.com/open?id=1gbfV2bkbYwXZbv1pldQRiIiT_I1dgr8j

iyadfawwaz commented 4 years ago

that is the best way class Connection extends AsyncTask<String,Integer,ApiConnection>{ protected ApiConnection doInBackgeound(String... strings){ ApiConnection apiConnection; try{ apiConnection = ApiConnection.connect(strings[0]; apiConnection.login(strings[1],strings[2]); return apiConnection; }catch(Exception e){} return null; }} on MainActivity Connection conn = new Connection(); conn.execute("IP","ADMIN","PASSWORD"); to get ApiConnection you will do (get()) after ) guards

iyadfawwaz commented 4 years ago

for all you can dependecy sy.iyad.MikroSy:mikrosy:1.7.24

18uu commented 3 years ago