I/O module: readUntil - Githubissues

chapel-lang / chapel

a Productive Parallel Programming Language

https://chapel-lang.org

Other

1.78k stars 418 forks source link

I/O module: readUntil #19769

Closed mppf closed 1 year ago

mppf commented 2 years ago

This issue is a spin-off from issue #19496.

This issue proposes having a readUntil for reading a string/bytes until some kind of separator. Since reading a line is a very common operation, that should get its own function (e.g. readLine as discussed in #19495). However it is useful to have a function that can read until something for other cases.

Here is a sketch of what this might look like:

// maxSize arguments indicate that the function should throw
// if it finds a input longer than that (and leaves the input there)

// keepSeparator means that a separator found in the input will be included in
// the returned string. Note that if the input reaches EOF without a separator,
// the returned string won't contain a separator, even if keepSeparator=true.

// The first set uses a separator that is a string/bytes, so it could be e.g. "end".

// For this one, t can be bytes or string
proc reader.readUntil(type t=string, separator: t, maxSize=-1, keepSeparator=true): t throws

// these two functions:
//   return `false` if EOF is reached and no data is read
//   resize the passed string/bytes (but may reuse the existing buffer)
proc reader.readUntil(ref s: string, separator: string, maxSize=-1, keepSeparator=true): bool throws
proc reader.readUntil(ref b: bytes, separator: bytes, maxSize=-1, keepSeparator=true): bool throws

// The second set reads until a regular expression
proc reader.readUntil(type t=string, separator: regex(t), maxSize=-1, keepSeparator=true): t throws

// these two functions:
//   return `false` if EOF is reached and no data is read
//   resize the passed string/bytes (but may reuse the existing buffer)
proc reader.readUntil(ref s: string, separator: regex(string), maxSize=-1, keepSeparator=true): bool throws
proc reader.readUntil(ref b: bytes, separator: regex(bytes), maxSize=-1, keepSeparator=true): bool throws

These functions have some similarity to:

19610.

Should it be called readUntil? Or does that imply that it leaves the separator in the input? I don't think this function should leave the separator in the input. I'm not sure I can think of a better name. readPast doesn't sound great to me (and "past" could be misinterpreted as "read from history").

Also, should we have set-of-strings (for "read until any of these characters" operations) or a read-until-whitespace variant?

bradcray commented 2 years ago

What do you think about having a bool/enum argument that says whether to consume the delimiter or leave it in place? That would make the rationale for these names stronger. Making it not have a default would have the benefit of forcing the user to check their assumptions, though it might be slightly annoying to anyone who thought there was an "obvious" right default.

I can imagine cases where the delimiter would want to be left alone. For example in reading CLBG fasta files, I could imagine using a readUntil(..., ">", ) in which I would not want it to consume the ">" but to leave that as the start of the next sequence. There isn't really anything that marks the end of a sequence apart from the start of the next one or the EOF itself.

Which raises one other question: Am I correct that if EOF is reached before the delimiter is found, this will consume the rest of the file (and indicate EOF)?

mppf commented 2 years ago

What do you think about having a bool/enum argument that says whether to consume the delimiter or leave it in place? That would make the rationale for these names stronger. Making it not have a default would have the benefit of forcing the user to check their assumptions, though it might be slightly annoying to anyone who thought there was an "obvious" right default.

Seems reasonable. Do we really need all 2x2 combinations? They are:

store separator in resulting string but leave it in the input
store separator in resulting string and consume it from the input
don't store separator in resulting string and leave it in the input
don't store separator in resulting string and consume it from the input

Arguably (1) is weird/confusing (because it would lead to reading the separator twice). If we could get away with not supporting (4) then we could have just one bool, where if it's stored in the result, it's consumed from the input; and if not, it's not consumed.

I can imagine cases where the delimiter would want to be left alone. For example in reading CLBG fasta files, I could imagine using a readUntil(..., ">", ) in which I would not want it to consume the ">" but to leave that as the start of the next sequence. There isn't really anything that marks the end of a sequence apart from the start of the next one or the EOF itself.

Yeah, that makes sense. I imagine though in such cases you wouldn't want the ">" in the resulting string (so it is case 3 above).

Which raises one other question: Am I correct that if EOF is reached before the delimiter is found, this will consume the rest of the file (and indicate EOF)?

I tried to say this earlier:

// keepSeparator means that a separator found in the input will be included in
// the returned string. Note that if the input reaches EOF without a separator,
// the returned string won't contain a separator, even if keepSeparator=true.

It's intended to behave like readLine proposals in this way. If there is no separator and you pass keepSeparator=true, you'll get a string out that doesn't have the separator, and it would return that string / true and not throw (because some data was read). The next I/O would indicate EOF.

If you have keepSeparator=false, you won't be able to notice this situation, other than that the next read indicates EOF.

bradcray commented 2 years ago

All of options 2-4 seem useful to me. Not supporting 4 seems similar to not supporting a "dropNewline"-style option for readLine(). I.e., if I'm doing readUntil("\n") I may want to advance past the newline yet not have it store up in my result (and equivalently for other separators apart from newline, I think).

mppf commented 2 years ago

From an off-issue discussion, it was approximately split 50/50 between people who thought that we really do need all 2x2 combinations to be available and people who thought that we only needed 2 combinations - always consume but chomp/strip it or not.

mppf commented 1 year ago

The most recent proposal was to have two boolean arguments to specify all of the 4 possible behaviors above. The boolean arguments could be

consumeSeparator (if true, the separator will not be left in the channel; if false, the channel will be left pointing at the separator (or EOF if that came first)
includeSeparator (if true, the separator will be included in the resulting bytes/string; if false, it will not)

There is a proposal to use an enum instead. A straw-person for that is enum separator { leave, consume, return } but that specific proposal can't work because return is a keyword.

We discussed these in a meeting but there was not a convergence.

bradcray commented 1 year ago

Some early-morning musings on this:

Should we only support 2 modes rather than 3 or 4? (where I'd focus on options 2 and 4 from Michael's list to make it symmetrical with readLine(), and call it stripSep[arator]: bool
The main downside of doing so would seem to be "What if someone needed an efficient implementation of the third mode in the future (e.g., "read until you hit the # starting a comment starter, but then leave that comment there for the next read step to consume?"). Then I worry that we'd want to add a new routine, but that felt challenging: readUntil() sounds to me like it's excluding the separator rather than including it. That made me wonder, if we had two different routines, what the other would be called.
- Since, readUntil(",") sounded to me like it doesn't read the comma it could be the exclusive version, and something like readThrough(",") or readThroughNext(",") sounded like it would include the comma (but I don't particularly like those names... :( )
- Alternatively, if people felt readUntil(",") sounded like it obviously would stop at the comma, we could use a different name for the "don't consume" case. I found it hard to come up with a good name, though. Like readUpTo(",") comes to mind, but that sounds like a synonym for readUntil(",") to me.
So then, that took me back to the idea of making it an enum, where my current best attempt is:
```
enum separator {
include,
strip,
exclude,
}
```
supporting patterns like readUntil(",", mode=separator.strip), readUntil(",", separator.include), use separator; readUntil(",", exclude);, or readUntil(",") (where I'm imagining include would be a good default, as the common case; that said, if others agreed with my reservations that readUntil(",") sounds like the comma would not be consumed, maybe separator.exclude should be the default.

bradcray commented 1 year ago

Re-reading the OP in https://github.com/chapel-lang/chapel/issues/21392 makes me realize that another way to handle the naming of the two-routine solution would be to have readLine() handle the "consume separator" case by taking an optional separator; and then to have readUntil() require a separator and support the exclusive behavior. The main downside of this is that it arguably abuses the intuitive notion of what a "line" is; it also makes the readLine() interface a little more complicated.

That also makes me think of readWord() or readChunk() as alternative "consume separator" routine names that focus on what is being consumed like readLine() rather than what is being stopped at.

mppf commented 1 year ago

I'm often advocating for different routine names for different behaviors, but it doesn't bother me in this case that readUntil can consume the separator or not.

That also makes me think of readWord() or readChunk() as alternative "consume separator" routine names that focus on what is being consumed like readLine() rather than what is being stopped at.

Yeah, that seems like an interesting possibility. Other ideas along those lines: readDelimited or readSeparated.

What about the enum idea? I think my main reservation with it is, if we add an enum here because of some nervousness with code like readUntil(x, true, true) that could be better written readUntil(x, consumeSeparator=true, includeSeparator=true) -- why aren't we making all of our bool formals into enums? Why doesn't readLine use an enum for the stripNewline formal? Can't we also argue that readLine(x, true) is better written readLine(x, stripNewline=true)?

I guess at the end of the day, using an enum here feels to me like it's enforcing a kind of style guidance. At the same time, I still think that the two-argument version has clearer meaning.

Lastly, include is a keyword, so it can't be an enum element here.

Anyway, heading in a different direction. Suppose we had readDelimited. I've been thinking we might wish to make the formal argument name more consistent with readLine.

What about proc readDelimited(ref s: string, separator: string, maxSize=-1, readSeparator=true, stripSeparator=false) ?

So to get the 3 behaviors we think have use-cases:

A: store separator in resulting string and consume it from the input: readSeparator=true stripSeparator=false (default)
B: don't store separator in resulting string and consume it from the input stripSeparator=true readSeparator=true
C: don't store separator in resulting string and leave it in the input readSeparator=false (and the value of stripSeparator is ignored)

That leaves no way to get the 4th behavior that we don't think is useful, which should be OK.

About the name readSeparator: one can argue that this name not ideal, because the separator is technically read either way, it's just a matter of where the channel position is left. (So, going based on how the implementation will work, the name would be something like rewindToSeparatorStart). Anyway, I think that saying "The separator was read" has close enough meaning to "The channel current position is beyond the separator" for this purpose and the name readSeparator is relatively intuitive.

lydia-duncan commented 1 year ago

It worries me a little to have an argument that gets ignored, but I'd be okay with it if we were very explicit that that would happen in the documentation for it

jeremiah-corrado commented 1 year ago

I'm also somewhat opposed to ignoring the second argument if the first is false.

The don't-consume-but-do-include behavior is definitely weird and I don't think many people will use it, but I think it would be confusing to define a method where it appears to be supported without actually supporting it. Put another way, I think it's better if users can go through the process of writing: while readUntil(s, -1, false, true) do writeln(s);, getting something they maybe didn't expect, and then changing their code and seeing a change in output.

Assuming my current implementation makes sense, supporting readSeparator=false, includeSeparator=true is as simple as appending the separator string to the returned value, so I don't see a strong reason not to support it (at least from the implementation perspective).

Using an enum:

idea E1: enum separator {leave, advancePast, includeSeparator}
idea E2: enum separator {leave, consume, return}
idea E3: enum separator {exclude, strip, include}
idea E4: enum separator {exclude, strip, keep}
(note: we can't use E2 or E3, because they both include a keyword)

Pros:

can handle the 3/4 useful cases naturally
setting the argument ends up being more self-documenting

Cons:

hard to describe behavior clearly in 1 word; leave keep and return might all seem to mean the same thing
enforces a kind of style guidance
it would be inconsistent with readLine's stripNewline:bool formal argument. Or, it might lead us to thinking that readLine's stripNewline argument should also be an enum
hard to see how to make advancePastByte consistent with this idea since there are only 2 modes that make sense for that function (#19610)

Using two flags:

idea F1: consumeSeparator: bool and includeSeparator: bool
idea F2: consumeSeparator: bool and stripSeparator: bool

Pros:

separates the 4 cases into 2 parts which some feel have clearer meaning

Cons:

it's possible to write the case that we don't think is useful and that seems confusing: consumeSeparator=false, includeSeparator=true
it might be harder to read the code if it's not using named argument passing, and it might seem verbose if it uses named argument passing
hard to make this consistent with readLine

Neutral:

would probably create advanceUntil instead of advancePastByte / advancePastNewline to be consistent with this idea (#19610)

Using two different routines:

(the first routine shown in these ideas consumes the delimiter, the second does not)

idea R1: readLine and readChunk/readWord
idea R2: readDelimited/readSeparated and readUntil
idea R3: readDelimited and readUpTo
idea R4: readPast and readUpTo

Pros:

can handle the 3/4 useful cases naturally (because the 2rd routine doesn't have to accept an argument)
routine name will always be present at call sites and communicates what is happening
in looking at example codes using these, the consume it or not part is essential to how the I/O is being done / what subsequent operations can expect -- so maybe it is not something that should be left to a flag
the 1st routine can be a generalization of readLine and be consistent with it -- in particular, readDelimiter would use something like stripDelimiter: bool = false as the formal argument, which is consistent with readLine's stripNewline: bool = false.
generalizes reasonably well to the related advance functions; e.g., advanceDelimited and advanceUpTo (#19610)

Cons:

None brought up yet for this category as a whole
(for readPast, it could be misinterpreted as "read the past")

jeremiah-corrado commented 1 year ago

In a recent offline design discussion, we decided to go with the two-method proposal from above and made good progress on fleshing out the details of the proposal.

Working proposal:

We came up with a tentative set of names for readUntil's replacements as well as the analogous "advance" methods:

behavior:	read	advance
consume newline	`readLine(..., stripNewLine=false)`	`advanceLine(...)`
consume separator	`readPast(..., separator, stripSeparator=false)`	`advancePast(..., separator)`
up to separator	`readUpTo(..., separator)`	`advanceUpTo(..., separator)`

(readLine already exists)

`readUpTo`/`advanceUpTo` names:

The group consensus was that readUpTo and advanceUpTo are acceptable names for those methods; however we weren't sure that those are the best options. We'd like to do a bit more investigation and brainstorming before landing on them definitively.

I'll investigate what some other languages do (if they have this functionality) and report here to see if that sparks any ideas.

We also discussed:

readUntil / advanceUntil
readBefore / advanceBefore
readTo / advanceTo

`separator` argument name and type

In each of the methods with a separator argument, we'd like it to be able to take at least a string and bytes argument and potentially a regex(string)/regex(bytes). Note: whether we ultimately go with names like advancePast — as opposed to advancePastByte — depends on what we choose for the types of this argument.

I'll do a performance investigation to see if avancePast(separator="a") performs on par with advancePastByte("a".toByte()). If there is no performance drawback, then we intend to go with the method names and argument types described above.

We also discussed whether to call the argument "delimiter" or something else, but landed on "separator" because there is a precedent in other places in the library, and it sounds general enough to encapsulate string, bytes and regex(?) (whereas "delimiter" in particular doesn't sound like it would refer to a regular expression separator).

`separator` argument length

We discussed whether we should wait to support multi-character / multi-codepoint delimiters. There were a few reasons we may want to do so:

currently, advancePastByte (tentatively being replaced by advancePast(separator)) takes a uint(8) as its argument for performance reasons. We'll want to continue to use this in performance driven scenarios, so we should try to hold on to the performance benefits of matching with a single byte (instead of a sequence of bytes).
a single byte separator is the most common expected use case (ex: reading a CSV file with a , as the separator), so limiting support to that case for now seems reasonable.
users may misinterpret readPast(",\t\s") as "read past any of : ,, \t, \s" rather than reading past those characters in succession.

As such, we may want to constrict the separator to have a length of 1 byte by making it a param and emitting a compile time error if it is any longer. This restriction could be lifted as a non-breaking change in the future.

I'll do some further investigation to see if it's possible to create performant versions of these methods that allow multi-byte separators. This would likely involve calling a simpler implementation when the separator is a single byte, and a heavier implementation otherwise.

lydia-duncan commented 1 year ago

We decided to not just use the name readLine for the readUntil functionality due to the potential for confusion if a different separator than \n was provided

bradcray commented 1 year ago

I like the two-routine approach. As far as names go, I prefer readUntil("\n") over readUpTo("\n") (it rolls of the tongue better and I like that it's just two words; I think until suggests exclusive behavior if one were in doubt and didn't want to read the manual). Rather than readPast("\n"), I'd probably use readThrough("\n") (because it isn't actually reading past the \n at all).

jeremiah-corrado commented 1 year ago

I think until suggests exclusive behavior

I'd be okay with readUntil as a name, but I think readUpTo is slightly more clear w.r.t where the pointer is left off.

This could just be me, but somehow this:

var spanInnerText = htmlReader.readUpTo("</span>");

feels clearer than:

var spanInnerText = htmlReader.readUntil("</span>");

Rather than readPast("\n"), I'd probably use readThrough("\n")

We hadn't considered readThrought, but I agree that it sounds better. advanceThrough("\n") also sounds good to me.

jeremiah-corrado commented 1 year ago

Also, here is what I've found so far looking at other languages buffered IO methods:

language	consuming read	non-consuming read	consuming ("\n" specific)
C++	`get_line`	`get_line` + `unget`	`get_line` (w/o `delim` arg)
Rust	`read_until`	`read_until` + `seek`	`read_line`
Python	csv reader w/ `delimiter` arg	?	`readline`
Go	custom `ScanWords` function	?	`ScanLines`
Java	?	?	[`readLine`](https://docs.oracle.com/javase/10/docs/api/java/io/BufferedReader.html#readLine())

The non-consuming-read doesn't seem like a very common behavior.

mppf commented 1 year ago

Hmm... the fact that Rust has a consuming read called read_until is evidence that it's not so obvious that readUntil would be a non-consuming read.

jeremiah-corrado commented 1 year ago

I'll do some further investigation to see if it's possible to create performant versions of these methods that allow multi-byte separators. This would likely involve calling a simpler implementation when the separator is a single byte, and a heavier implementation otherwise.

I ran a performance comparison on one of the revcomp shootout benchmarks using a procedure like the following:

proc advancePast(separator: string) {
  if separator.numBytes == 1 {
    advancePastByte(separator.toByte());
  } else {
    slowAdvancePastImpl(separator);
  }
}

I.e., I collected average execution times for two revcomp codes on a large problem size:

(1) the current code that calls advancePastByte directly, without the conditional:

reader.advancePastByte(">".toByte());

(2) and another that uses the above procedure:

reader.advancePast(">");

The performance difference was a couple of orders of magnitude smaller than the programs total runtime. So I'd be good to go ahead with a design that allows for multi-byte/multi-codepoint separators and conditionally uses higher performance implementations for single-byte separators when possible.

We had discussed using param separator arguments s.t. the faster implementation could be selected at compile time; however, this had little to no effect on performance in the above test, so I'm inclined to use non-param separators.

lydia-duncan commented 1 year ago

Rather than readPast("\n"), I'd probably use readThrough("\n") (because it isn't actually reading past the \n at all).

We did discuss the use of past in the meeting, but when we actually looked at code examples, it was pretty clear what it was doing. Here are the code examples we looked at:

while readPast("-", s, stripSeparator=false) {
    myList.append(s);
}

while readDelimited("-", s, stripSeparator=false) {
    myList.append(s);
}

while readSeparated("-", s, stripSeparator=false) {
    myList.append(s);
}

while readPastSeparator("-", s, stripSeparator=false) {
    myList.append(s);
}

jeremiah-corrado commented 1 year ago

I think readThrough is also pretty clear in the context of that example:

while r.readThrough("-", s, stripSeparator=true) {
    myList.append(s)
}

I also think we need to consider the problem Brad brought up that readPast could be interpreted as readAfter. As in: "If you look past the hill, you'll see a mountain".

I would still interpret the meaning of readPast("\n") as: read and put the pointer after the next newline. However some might interpret it as: read whatever comes after the newline. I don't think readThrough has this problem.

jeremiah-corrado commented 1 year ago

Here is a more detailed summary of the interface I think we should implement based on discussion so far:

// IO module:
proc fileReader.readThrough(separator: ?t, maxSize=-1, stripSeparator=false): t throws
  where t==string || t==bytes { ... }
proc fileReader.readThrough(ref s: string, separator: string, maxSize=-1, stripSeparator=false): bool throws { ... }
proc fileReader.readThrough(ref b: bytes, separator: bytes, maxSize=-1, stripSeparator=false): bool throws { ... }

proc fileReader.readUpTo(separator: ?t, maxSize=-1): t throws
  where t==string || t==bytes { ... }
proc fileReader.readUpTo(ref s: string, separator: string, maxSize=-1): bool throws { ... }
proc fileReader.readUpTo(ref b: bytes, separator: bytes, maxSize=-1): bool throws { ... }

proc fileReader.advanceThrough(separator: string) throws { ... }
proc fileReader.advanceThrough(separator: bytes) throws { ... }

proc fileReader.advanceUpTo(separator: string) throws { ... }
proc fileReader.advanceUpTo(separator: bytes) throws { ... }

// Formatted IO module:
proc fileReader.readThrough(separator: regex(?t), maxSize=-1, stripSeparator=false): t throws
  where t==string || t==bytes { ... }
proc fileReader.readThrough(ref s: string, separator: regex(string), maxSize=-1, stripSeparator=false): bool throws { ... }
proc fileReader.readThrough(ref s: bytes, separator: regex(bytes), maxSize=-1, stripSeparator=false): bool throws { ... }

The four "advance" methods will use qio_channel_advance_past_byte under the hood when the separator is a single byte. Otherwise, they'll leverage the same helper function as readThrough and readUpTo to find the location of the separator in the channel, and then advance to that point.

I think implementing only the regex version of readThrough for now is a good start. If users need something like readUpTo(regex(?)) or advanceUpTo(regex(?), I think those are similar enough to the existing channel.search(regex(?)):regexMatch, that they don't need to be implemented right away. These could be left as a post-2.0 task or wait for a user request. (Alternatively, these methods would all rely on the same underlying _findRegexMatch() function, so It wouldn't be too much more work to implement them all).

This is just a stake in the ground to see how people feel; I am of course still open to modifying any of the names or interface details.

jeremiah-corrado commented 1 year ago

In an ad-hoc subteam discussion, we've landed on the following design for the consuming/non-consuming read and advance methods on the fileReader type:

New Interface

The proposal from the previous message has been modified slightly. The "UpTo" methods have been renamed use "To" instead, the regex readThrough overloads have been moved to the Regex module as tertiary methods instead of living in the FormattedIO module, and the ref string/bytes formal arguments now come after the separator:

// IO module:
proc fileReader.readThrough(separator: ?t, maxSize=-1, stripSeparator=false): t throws
  where t==string || t==bytes { ... }
proc fileReader.readThrough(separator: string, ref s: string, maxSize=-1, stripSeparator=false): bool throws { ... }
proc fileReader.readThrough(separator: bytes, ref b: bytes, maxSize=-1, stripSeparator=false): bool throws { ... }

proc fileReader.readTo(separator: ?t, maxSize=-1): t throws
  where t==string || t==bytes { ... }
proc fileReader.readTo(separator: string, ref s: string, maxSize=-1): bool throws { ... }
proc fileReader.readTo(separator: bytes, ref b: bytes, maxSize=-1): bool throws { ... }

proc fileReader.advanceThrough(separator: string) throws { ... }
proc fileReader.advanceThrough(separator: bytes) throws { ... }

proc fileReader.advanceTo(separator: string) throws { ... }
proc fileReader.advanceTo(separator: bytes) throws { ... }

// Regex module:
proc fileReader.readThrough(separator: regex(?t), maxSize=-1, stripSeparator=false): t throws
  where t==string || t==bytes { ... }
proc fileReader.readThrough(separator: regex(string), ref s: string, maxSize=-1, stripSeparator=false): bool throws { ... }
proc fileReader.readThrough(separator: regex(bytes), ref s: bytes, maxSize=-1, stripSeparator=false): bool throws { ... }

Design details:

readThrough is a generalization of readLine that accepts a multi-byte string or bytes separator or a regex(?) separator.
readTo is similar to readThrough except that it does not consume the separator in the channel. (As such, it also doesn't have a stripSeparator argument).
advanceThrough replaces the existing advancePastByte method (see #19610). For single byte string or bytes separators, it uses the same fast implementation as advancePastByte. It also supports a heavier weight implementation for multi-byte separators.
advanceTo is similar to advanceThrough except it does not consume the separator in the channel. It also uses a faster implementation if a single-byte separator is supplied.

Code Examples

Here are some examples of how each new method could be used:

readThrough Read a comma separated list of integers into a list(int):

use IO, List;

var l = new list(int),
     s: string,
     r = openReader("commaSeparatedList.txt");

while r.readThrough(",", s, stripSeparator=true) {
  l.append(s:int);
}

readTo and advanceThrough Read the contents of a <details> tag from an html file:

use IO;

var r = openReader("website.html");

r.advanceThrough("<details>");
var detailsInnerText = r.readTo("</details>");

advanceTo Read a type that that is delimited by "|", skipping everything before it:

use IO;

record t {
  var x: int;

  proc readThis(fr) throws {
    fr.matchLiteral("|");
    this.x = fr.read(int);
    fr.matchLiteral("|");
  }
}

var r = openReader("textIDontWantAndThenT.txt");

r.advanceTo("|");
var myT = r.read(t);

readThrough(regex) Read a list of integers separated by commas or newlines into a list(int):

use IO, Regex, List;

var l = new list(int),
     s: string,
     r = openReader("commaAndNewlineSeparatedList.txt");

const commaOrNewline = compile("[,\\n]");

while r.readThrough(commaOrNewline, s, stripSeparator=true) {
  l.append(s:int);
}

More examples can be found in the tests in this PR: https://github.com/chapel-lang/chapel/pull/21703/files

chapel-lang / chapel

I/O module: readUntil #19769

19610.

Using an enum:

Using two flags:

Using two different routines:

Working proposal:

readUpTo/advanceUpTo names:

separator argument name and type

separator argument length

New Interface

Design details:

Code Examples

`readUpTo`/`advanceUpTo` names:

`separator` argument name and type

`separator` argument length