chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 420 forks source link

adding channel.advancePastByte ? #8105

Closed mppf closed 6 years ago

mppf commented 6 years ago

I've been working with @benharsh on some revcomp benchmarks game I/O speed issues. I'm proposing to add a method to IO.chpl called channel.advancePastByte that reads until a particular byte is found and leaves the channel cursor just after that byte. (And raises an EOF error if the byte is not found).

This function enables succinct expression of the fastest known I/O pattern for revcomp. It could be used in other contexts as well, for example, we could use it in the implementation of readln to skip characters until the newline.

See also PR #8103.

/* Reads until ``byte`` is found and then leave the channel offset
    just after it. If that byte is never found, raises an UnexpectedEOFError. */
proc channel.advancePastByte(byte:c_int) throws
ben-albrecht commented 6 years ago

Are there examples of this functionality in IO libraries of other languages?

mppf commented 6 years ago

@benharsh - to what extent was the buffered reader you were emulating part of the Rust standard library?

C++ has such a thing: http://www.cplusplus.com/reference/istream/istream/ignore/

benharsh commented 6 years ago

Rust has its BufReader as part of its standard IO library:

https://doc.rust-lang.org/std/io/struct.BufReader.html

mppf commented 6 years ago

and https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until in particular is the corresponding function.

mppf commented 6 years ago

@benharsh @ben-albrecht - Question: it seems odd to me that the byte argument should have type c_int. In fact the implementation just truncates it to a byte. That it's a c_int at all is a c-ism I got from memchr. Do you think it should:

  1. Be an int that we always extract the bottom byte from
  2. Be an int that we safeCast
  3. Be a uint(8)

I think I prefer 3 but I'm curious if these other options appeal to somebody else.

benharsh commented 6 years ago

I vote for uint(8).