drapostolos / rdp4j

Remote Directory Poller for Java
MIT License
46 stars 25 forks source link

Directory Poller Stops When IO Error Raised #28

Open ercanebiler opened 2 years ago

ercanebiler commented 2 years ago

Hi,

First of all thanks for this awesome library.

I created a SFTP poller which similar to OhadR's: https://github.com/OhadR/rdp4j-client/

When i start my application it works very well. But after some hours I'm getting an IO error in listFiles method.

10/05/2022 18:48:41 (com.company.MyPlugin.sftp.SFtpDirectory-listFiles-109) pool-6-thread-1 ERROR: IO exception while checking file list: 
4: 
   at com.jcraft.jsch.ChannelSftp.ls(ChannelSftp.java:1747)
   at com.jcraft.jsch.ChannelSftp.ls(ChannelSftp.java:1553)
   at com.company.MyPlugin.sftp.SFtpDirectory.listFiles(SFtpDirectory.java:104)
   at com.github.drapostolos.rdp4j.Poller.collectCurrentFilesAndNotifyListenersIfIoErrorRaisedOrCeased(Poller.java:111)
   at com.github.drapostolos.rdp4j.Poller.call(Poller.java:54)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: inputstream is closed
   at com.jcraft.jsch.ChannelSftp.fill(ChannelSftp.java:2911)
   at com.jcraft.jsch.ChannelSftp.header(ChannelSftp.java:2935)
   at com.jcraft.jsch.ChannelSftp.ls(ChannelSftp.java:1643)
   ... 8 more
10/05/2022 18:48:41 (com.company.MyPlugin.listener.SFtpListener-ioErrorRaised-26) pool-6-thread-1 ERROR: Listener I/O error raised!: 
java.io.IOException: 4: 
   at com.company.MyPlugin.sftp.SFtpDirectory.listFiles(SFtpDirectory.java:110)
   at com.github.drapostolos.rdp4j.Poller.collectCurrentFilesAndNotifyListenersIfIoErrorRaisedOrCeased(Poller.java:111)
   at com.github.drapostolos.rdp4j.Poller.call(Poller.java:54)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
Caused by: 4: 
   at com.jcraft.jsch.ChannelSftp.ls(ChannelSftp.java:1747)
   at com.jcraft.jsch.ChannelSftp.ls(ChannelSftp.java:1553)
   at com.company.MyPlugin.sftp.SFtpDirectory.listFiles(SFtpDirectory.java:104)
   ... 6 more
Caused by: java.io.IOException: inputstream is closed
   at com.jcraft.jsch.ChannelSftp.fill(ChannelSftp.java:2911)
   at com.jcraft.jsch.ChannelSftp.header(ChannelSftp.java:2935)
   at com.jcraft.jsch.ChannelSftp.ls(ChannelSftp.java:1643)
   ... 8 more
10/05/2022 18:48:45 (com.company.MyPlugin.listener.SFtpListener-ioErrorCeased-30) pool-6-thread-1 ERROR: I/O error  ceased.com.github.drapostolos.rdp4j.IoErrorCeasedEvent@762828be

The directory poller's polling interval is 500 milisecond. But I see this error just once. After that probably directory poller stops.

listFiles method in SftpDirectory.java

public Set<FileElement> listFiles() throws IOException {
    Set<FileElement> result = new LinkedHashSet<FileElement>();

    if (!channelSftp.isConnected()) {
        closeSessionAndChannel();
        createSessionAndChannel();
    }
    try {
        Vector<LsEntry> filesList = channelSftp.ls(workingDirectory);
        for (LsEntry file : filesList) {
            result.add(new SFtpFile(file));
        }
    } catch (SftpException e) {
        logger.error("IO exception while checking file list: ", e);
        throw new IOException(e);
    } catch (Exception e) {
        logger.error("Unexpected exception: ", e);
    }

    return result;
}

closeSessionAndChannel method

public static void closeSessionAndChannel() {
    channelSftp.exit();
    session.disconnect();
}

createSessionAndChannel method

public static void createSessionAndChannel() {
    JSch jsch = new JSch();
    try {
        session = jsch.getSession(username, host, port);
        session.setConfig(STRICT_HOST_CHECKING, STRICT_HOST_CHECKING_NO);
        session.setPassword(password);
        session.connect();

        Channel channel = session.openChannel(CHANNEL_TYPE);
        channel.connect();
        channelSftp = (ChannelSftp) channel;
    } catch (JSchException e) {
        logger.error("JSchException while creating session and channel: ", e);
    }
}

I'm closing and recreating the session and channel if they're not connected on listing files. So what could be the problem? @drapostolos

drapostolos commented 2 years ago

hi

The normal behaviour is that rdp4j will fire an IoErrorRaisedEvent event when an IO exception is thrown from listFiles(). Any consecutive IO exceptions will be silent. But once the IO exceptions goes away (i.e. first successful file listing after io error) there will an IoErrorCeasedEvent event triggered.

From your log above it looks like you first get an IO error raised, then 4 seconds later you get an IO error ceased.

10/05/2022 18:48:41 (com.company.MyPlugin.listener.SFtpListener-ioErrorRaised-26) pool-6-thread-1 ERROR: Listener I/O error raised!: 
...
10/05/2022 18:48:45 (com.company.MyPlugin.listener.SFtpListener-ioErrorCeased-30) pool-6-thread-1 ERROR: I/O error  ceased.com.github.drapostolos.rdp4j.IoErrorCeasedEvent@762828be

Are you sure the Poller stops after that?

I'm not sure if it is related or not, but the createSessionAndChannel() method swollows (and only logs) the JSchException when it happens and then continues executing. I'm not sure how the rest of the code (in listFiles()) will react to that?

Do you have any idea what the 4 means in below error message?

java.io.IOException: 4:
ercanebiler commented 2 years ago

Hi,

Thanks for quick reply. I tried to reproduce the issue.

Application works in production. I thought poller stops because I don't get that error message more than once.

P: Prod Centos S: Prod SFTP server T: Test Server Centos L: Local

Application works in P. It's connecting S. The error coming just one as I see in the error log. I wrote create and close methods for recovery of the session and channel. I'm expecting to see that error more than once. Because it polls every 500 miliseconds. When we get the error, It's not getting newly added files. If I restart the application it works. After sometime we are getting error again.

I tried to reproduce the error in my L and debugged. I connected to the T(with vpn) like an sftp server. After connection I shutdown the vpn and I got an IO error like connection reset by peer. In this case poller tries to list files with specified interval. Then i get connection timeout error. But it still trying to listFiles.

After that i reconnected to vpn. Session recovered and got the files. I couldn't reproduce inputstream closed but idea was the same.

I actually don't know what "4" means. Tomorrow I'll try to fix this, when I get the error in list files I'll stop directory poller. Then recreate it. If you have any idea you can share with me.

Thank you.

ercanebiler commented 2 years ago

Hi,

When I call the directory poller stop method, It's just hanging on "this.latch.await();" in the awaitTermination() method, not stopping. Do you have any idea?

@drapostolos

ercanebiler commented 2 years ago

I'm commenting too much but there's a prod environment issue.

I tried to mock the issue like this;

I'm connecting to a server which it have the folder that I'm listening. This connection has a vpn configuration. There are 1000 files. After a minute, I disconnect the vpn. Then an IO error raised, notified to the listener. Directory poller tries listing files again and again. Every time it gets connection timeout/pipe closed. I connect to vpn again. List files still working, session and channel recovering. It returns the result image

Then this method works;

image

After it puts all temp files to the real one, Poller.class call() method works.

image

It's doing actions for remaning poll cycles, then returns null. After this it try to list files again.

As I see Directory Poller gets the 1k file info and notifity listener one by one. When a connection lost, It still notifies the listener.

public static String getFileContent(String fileName) {
    try {
        if (downloadEnabled.equalsIgnoreCase("Y")) {

            channelSftp.get(workingDirectory + fileName, downloadPath + fileName);
        }
        InputStream stream = channelSftp.get(workingDirectory + fileName);
        BufferedReader br = new BufferedReader(new InputStreamReader(stream));
        String fileContent = br.lines().collect(Collectors.joining());
        logger.info(fileContent);
        stream.close();
        return fileContent;
    } catch (SftpException e) {
        logger.error("SftpException while getting file content: ", e);
    } catch (Exception e) {
        logger.error("Unexpected exception while getting file content: ", e);
    }
    return null;
}

I'm trying to get file content for each fileAdded notification, so I'm getting input stream/pipe closed for every listed files.

When the second poll cycle starts its not notifiying the listener because there is no new added files.

Because of this, (Best thing I can think) When there's an IO error raised, i have to stop the directory poller and restart it. I tried to dp.stop() method but as i said its hanging on the awaitTermination.

Seeking for help about stopping directory poller issue. Thanks. @drapostolos

drapostolos commented 2 years ago

Hi

Hi,

When I call the directory poller stop method, It's just hanging on "this.latch.await();" in the awaitTermination() method, not stopping. Do you have any idea?

@drapostolos

What the javadoc says for the awaitTermination() method:

     * Blocks until the last poll-cycle has finished and all {@link AfterStopEvent} has been
     * processed.

Are you listening to any AfterStopEvents? Could there be something blocking there? Or anything blocking/stuck in the last poll cycle?

I also notice from your getFileContent() method, that the stream is never closed if there is an exception. Which means there is a potential resource leak going on which could lead to undefined behavior.

Also, try to catch Throwable in your listFiles() implementation and log it if it happens. Could be there is a Throwable (Error) thrown (maybe due to resource leak?). Anyway, this could give some leads to where the issue is.