non-line buffering - Githubissues

From version

$ git status
HEAD detached from v1.7.1

Perhaps this is an abuse of CLUSH, but the line-buffering causes segments of output of binary files to be missing. The genesis of this problem arose when i was collecting logs (using tar) from several nodes using the python API and when trying to do a database dump. As a simple test the following command also failed when tarring up directories on a single node. (A ssh version of this line to a single node produces the correct tar file.)

    $ clush -N -w test.localhost.lan tar -cz /path_to_binary_data_on_rhost/ -f - > /tmp/clush_test.tar
    $ file /tmp/clush_test.tar
    /tmp/clush_test.tar: data

As far as i could tell (and from the comments in the code, thank you!) the line buffering causes some of the data to go missing on the transfer at EngineClient.py:EngineClient::_readline(), see the following.

    def _readlines(self, sname):
        """Utility method to read client lines."""
        # read a chunk of data, may raise eof
        readbuf = self._read(sname)
        assert len(readbuf) > 0, "assertion failed: len(readbuf) > 0"

        # Current version implements line-buffered reads. If needed, we could
        # easily provide direct, non-buffered, data reads in the future.

        rfile = self.streams[sname]

        buf = rfile.rbuf + readbuf
        lines = buf.splitlines(True)
        rfile.rbuf = ""
        for line in lines:
            if line.endswith('\n'):
                if line.endswith('\r\n'):
                    yield line[:-2] # trim CRLF
                else:
                    # trim LF
                    yield line[:-1] # trim LF
            else:
                # keep partial line in buffer
                rfile.rbuf = line
                # breaking here

I was able to transfer smaller binary files by altering line EngineClient.py:388 to read: rfile.rbuf = rfile.rbuf + line Because if this for-loop is iterated more than once, and no line ending is found, the rfile.buf gets overwritten at the second iteration. Although, this work-around didn't help with any larger files.

I've dug into this issue as much as i can for now, but i wanted to make note of it in case the project has plans to allow for non-line buffering or some kind of fixed blob buffering switch.

cea-hpc / clustershell

non-line buffering #309