bksides / bfc

A brainfuck compiler written in C for linux machines running on the x86_64 architecture.
0 stars 0 forks source link

optimize IO #6

Open bksides opened 7 years ago

bksides commented 7 years ago

The following is a potential scheme for optimizing the IO system:

As it stands, every character that is to be written to stdout is written as soon as the '.' instruction appears in the user program. Likewise, every character to be read from stdin is read as soon as the ',' instruction appears. This is inefficient, as for I/O bound programs a syscall is made for every character that gets read or written. This optimization involves buffering input and output, and providing a protocol for when these buffers should be flushed to maintain consistent behavior. Additionally, more tests should be written to ensure correctness of this optimization, but such tests are outside the scope of this particular issue.

THE BUFFERS

Three new readable, writable segments shall be introduced to each compiled brainfuck program: an input buffer, a dest buffer, and an output buffer. The lengths of these buffers shall be fixed at compile time but their particular length is not important to this discussion (It will be noted here, however, that the dest buffer must be at most 8 times the size in bytes of the input buffer. Reasoning for this policy is provided in the section titled FLUSHING THE DEST BUFFER). The %rsp register shall initially point to the low end of the dest buffer; the %rbp register shall initially point to the low end of the output buffer.

THE OUTPUT INSTRUCTION

The behavior of the '.' instruction shall be modified such that it no longer immediately calls the write system call. Instead, the '.' instruction shall move the contents of the current cell to the location pointed to by %rbp, and increment %rbp. This has the effect of queuing output and delaying its write to stdout until that either write is necessary for proper functioning, or until the output buffer is full.

MANAGING THE OUTPUT BUFFER

The output buffer shall be flushed in all of the following circumstances:

This list may not be comprehensive, hence the need for further testing of this optimization. It is acceptable for the output buffer to be flushed in more than just these scenarios if the overhead of testing for these precise scenarios is too heavy.

FLUSHING THE OUTPUT BUFFER

When the output buffer is flushed, a write() system call is made to stdout where the beginning of the output buffer is supplied as the source buffer, and (the value of %rbp ) - (the address of the beginning of the output buffer) is supplied as the buffer length. This will write all queued output to stdout.

Next, the value of %rbp is to be reset to the low end of the output buffer to represent the emptying of the queue.

THE INPUT INSTRUCTION

The behavior of the ',' instruction shall be modified such that it no longer immediately calls the read system call. Instead, the ',' instruction shall move the address of the current cell to the location pointed to by %rsp, and increment %rsp by 4. This has the effect of queuing input and delaying its read from stdin until either that read is necessary for proper functioning, or the dest buffer is full.

MANAGING THE DEST BUFFER

The dest buffer shall be flushed in all of the following circumstances:

This list may not be comprehensive, hence the need for further testing of this optimization. It is acceptable for the dest buffer to be flushed in more than just these scenarios if the overhead of testing for these precise scenarios is too heavy.

FLUSHING THE DEST BUFFER

When the dest buffer is flushed, a read() system call is made from stdin. The beginning of the input buffer is supplied as the destination buffer. The length of the read is calculated as follows:

The result of the calculations above will be the length passed to the call to read().

The reason, then, that the dest buffer can be no more than 8 times the size of the input buffer, is that a number of bytes equal to the size of the dest buffer divided by 8 will be read into the input buffer (that is, in the case of a full buffer). If the dest buffer is too big, the input buffer will overflow.

Once read() is called and the input buffer populated, the input buffer and dest buffer shall be iterated over from the low end, and each byte in the input buffer shall be written to the corresponding address in the dest buffer. Lastly, %rsp shall be reset to the low end of the dest buffer.

EDIT: Because it is sensible to implement the flushing behavior as their own functions, it is perhaps best not to use %rsp in this algorithm. Consider substituting with another register.