wrong number of byte requested by custom input port

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. create a custom input port
2. read 5 bytes with GET-BYTEVECTOR-N or GET-STRING-N

What is the expected output? What do you see instead?
5 bytes should be requested (COUNT argument) to the
custom read function, but 4096 are requested for
binary ports and 1024 for textual ports.

What version of the product are you using? On what operating system?
ypsilon-322

Please provide any additional information below.
If the underlying device is in blocking mode, requesting more
bytes will make it block even when the right number of bytes
are available.

Original issue reported on code.google.com by mrc....@gmail.com on 21 Dec 2008 at 8:59

GoogleCodeExporter commented 9 years ago

Thank you for your message.

Regarding custom read funcion, what my understanding is that the <count> 
parameter
indicates maximum data size that system can recieve in the <bytevector> and 
custom
read function try to fill that buffer as many as possible to make custom port 
work
effectively. If underlying device is in blocking mode and custom read function 
have
no way to read multiple data without blocking, custom read function may read 
one data
from the device, put it into buffer, then return 1 to system (or return 0 if 
device
hit EOF). If underlying device is in blocking mode but it support posix like 
read
function (socket for example), custom read function may try to read <count> of 
data
from device then return the number of data that it has read. 

Reference: [r6rs-lib 8.2.7 Input ports]
http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-9.html#node_sec_8.2.7

Your comments are welcome. :)
-- fujita

Original comment by y.fujita...@gmail.com on 23 Dec 2008 at 3:58

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Your interpretation of R6RS is correct, and in many cases the current Ypsilon
implementation can be made to work.  I mean that usually it is possible to 
configure
the underlying device in some sort of non-blocking mode (true or simulated).

But IMHO they have called them "custom" port to let the customer (that is me) 
decide
how to handle the device, and this should include buffering. For example if I 
want to
use the scatter/gather fd interface (writev and readv, available with the GNU C
Library and also described in the Single Unix Specification) what is the 
advantage
for the Scheme implementation in putting its buffers on top of mine?

Other examples are the buffering done by foreign libraries like libevent and 
libev.

Everybody should mind his own buffers.

Original comment by mrc....@gmail.com on 23 Dec 2008 at 10:56

GoogleCodeExporter commented 9 years ago

Thank you for your message.
I got your point and it is very interesting!

I think the problem is that R6RS custom port have no way to specify buffer mode 
for
best performance. (*1)
Do you think some extension which allow to make none buffered custom 
input/output
port solve this problem? (*2)

note:
*1 I think none buffered mode should not be default because it may hurt 
performance
significantly if scheme script repeatedly read small chunk. (on-the-fly file
parsing/conversion for example)
*2 With none buffered mode, custom read! procedure will receive buffer position 
and
read count from scheme script directly.

Your comments are very welcome! :)

-- fujita

Original comment by y.fujita...@gmail.com on 24 Dec 2008 at 5:54

GoogleCodeExporter commented 9 years ago

It all comes down to how  you want to organise your API, and
what  is  the cost  of  adding  unbuffered custom ports given
the current code base.  There could be:

make-custom-binary-input-port
make-custom-binary-output-port
make-custom-binary-input/output-port
...
make-custom-binary-input-port/unbuffered
make-custom-binary-output-port/unbuffered
make-custom-binary-input/output-port/unbuffered
...

or:

make-custom-binary-input-port
make-custom-binary-output-port
make-custom-binary-input/output-port
...
make-custom-binary-input-port/buffered
make-custom-binary-output-port/buffered
make-custom-binary-input/output-port/buffered

with "the other" procedures in an importable library.

Unless I have not noticed it  in the report: there is no way
to  putback/unread chars  in a  reading  port, so  I do  not
understand  why  the  custom  read  procedure  of  a  "none"
buffered port  should get  the buffer position  as argument.
The  device cursor  position is  selected with  the position
procedures (if seeking is possible for the device), and that
is all.

Personally, I care about  interfaces to foreign libraries (I
am not  Schemey), and many  of the better libraries  I would
like to interface do their own buffering.  With this bias, I
say that there is higher  probability for the device under a
custom port to handle its buffers.

More examples: reading and writing zipped files through Zlib
and Bzlib2 does not require another buffering layer.

Even with  atypical "devices" implementation  buffering gets
in the way.  I am thinking of interfacing Nettle and Gcrypt.
In this scenario an input/output custom port is a simple way
to  wrap a  hash sum  context  (tiger, haval,  sha, md);  if
reading from  such a port  returns the hash sum  computed so
far,  the  output  is  a  bytevector  of  well  known  size:
implementation buffering does no harm, but is useless.

In the  same scenario there  is an atypical  application for
which implementation's  buffering may seem useful  but it is
not: when  wrapping a symmetric block  cipher filter, output
comes in blocks  of known size and it  seems useful to store
them in the implementation's buffer.  But what if the filter
produces   more   ouput   bytes   that  requested   by   the
implementation buffering read?  I  need my own buffer anyway
to temporarily  store the data.   This happens with  all the
transformation filters  (base64, compression, decompression,
...).

A maybe-not-right usage of custom ports: when filtering data
through zlib  and bzlib decompression  contexts buffering by
the Scheme implementation gets in  the way if the section of
compressed  bytes   ends,  but  the  stream   goes  on  with
uncompressed bytes:  if those  uncompressed bytes end  up in
the    implementation's   writing   buffer,    they   become
unreachable, making it difficult to stack filters one on top
of the other.

In the end, given that  unbuffered ports are more general (I
can  implement my  own buffering  if I  need one):  how many
predictable  applications are  there in  which  buffering is
needed?

Original comment by mrc....@gmail.com on 24 Dec 2008 at 11:43

chazu / ypsilon

wrong number of byte requested by custom input port #64