brotchie / Parcoa

Objective-C Parser Combinators
Other
96 stars 10 forks source link

Lots of NSString copying #1

Open lilyball opened 11 years ago

lilyball commented 11 years ago

Just from reading your README, the way your Result objects work, it sounds like you're going to be copying the input string a lot. As in, every single Result object is going to end up making a new copy of the remainder of the input. This means that every single character parsed could result in a complete copy of the input (minus that one character). And with combinators, you could get several copies.

The reason that I'm saying this is because NSString does not share its backing character buffer between multiple string instances. This means that saying [someLargeString substringFromIndex:1] will make a copy of the entire string (minus one character).

If you do some memory profiling of parses of very large inputs I bet this will show up very clearly. Of course, I haven't actually tried this out myself, but just from reading your description of how the parser works I can't imagine it doing something else.

Given this, it might be prudent to give all parsers the entire input string and an NSRange corresponding with the unconsumed part, and having them return an NSRange of the remainder (or just an NSInteger of the count of consumed characters). The downside, of course, is this allows for writing parsers that look at previously-consumed characters, which you probably don't want. But it will prevent all the unnecessary string copies.

brotchie commented 11 years ago

Thanks for the feedback! I figured NSString object immutability would mean that substring operations, etc would back onto the original character buffer; it turns out that is not the case. I like your suggestion of parsers always taking the full input string as well as a NSRange, then returning the number of characters consumed. Then it's only pointers and integer values being passed around.

It also looks like I can make this modification without changing the external interface.