Perhaps this is an abuse of CLUSH, but the line-buffering causes segments of output of binary files to be missing. The genesis of this problem arose when i was collecting logs (using tar) from several nodes using the python API and when trying to do a database dump. As a simple test the following command also failed when tarring up directories on a single node. (A ssh version of this line to a single node produces the correct tar file.)
$ clush -N -w test.localhost.lan tar -cz /path_to_binary_data_on_rhost/ -f - > /tmp/clush_test.tar
$ file /tmp/clush_test.tar
/tmp/clush_test.tar: data
As far as i could tell (and from the comments in the code, thank you!) the line buffering causes some of the data to go missing on the transfer at EngineClient.py:EngineClient::_readline(), see the following.
def _readlines(self, sname):
"""Utility method to read client lines."""
# read a chunk of data, may raise eof
readbuf = self._read(sname)
assert len(readbuf) > 0, "assertion failed: len(readbuf) > 0"
# Current version implements line-buffered reads. If needed, we could
# easily provide direct, non-buffered, data reads in the future.
rfile = self.streams[sname]
buf = rfile.rbuf + readbuf
lines = buf.splitlines(True)
rfile.rbuf = ""
for line in lines:
if line.endswith('\n'):
if line.endswith('\r\n'):
yield line[:-2] # trim CRLF
else:
# trim LF
yield line[:-1] # trim LF
else:
# keep partial line in buffer
rfile.rbuf = line
# breaking here
I was able to transfer smaller binary files by altering line EngineClient.py:388 to read:
rfile.rbuf = rfile.rbuf + line
Because if this for-loop is iterated more than once, and no line ending is found, the rfile.buf gets overwritten at the second iteration. Although, this work-around didn't help with any larger files.
I've dug into this issue as much as i can for now, but i wanted to make note of it in case the project has plans to allow for non-line buffering or some kind of fixed blob buffering switch.
From version
Perhaps this is an abuse of CLUSH, but the line-buffering causes segments of output of binary files to be missing. The genesis of this problem arose when i was collecting logs (using tar) from several nodes using the python API and when trying to do a database dump. As a simple test the following command also failed when tarring up directories on a single node. (A ssh version of this line to a single node produces the correct tar file.)
As far as i could tell (and from the comments in the code, thank you!) the line buffering causes some of the data to go missing on the transfer at EngineClient.py:EngineClient::_readline(), see the following.
I was able to transfer smaller binary files by altering line EngineClient.py:388 to read:
rfile.rbuf = rfile.rbuf + line
Because if this for-loop is iterated more than once, and no line ending is found, the rfile.buf gets overwritten at the second iteration. Although, this work-around didn't help with any larger files.I've dug into this issue as much as i can for now, but i wanted to make note of it in case the project has plans to allow for non-line buffering or some kind of fixed blob buffering switch.