Raku / old-issue-tracker

Tickets from RT
https://github.com/Raku/old-issue-tracker/issues
2 stars 1 forks source link

slurp is mangling newlines, it should not (slurp ‘foo’) #6546

Open p6rt opened 7 years ago

p6rt commented 7 years ago

Migrated from rt.perl.org#132154 (status was 'open')

Searchable as RT132154$

p6rt commented 7 years ago

From @AlexDaniel

This was noticed in this PR​: https://github.com/rakudo/rakudo/pull/1171

Basically, slurp is changing \r\n to \n. This may seem like a reasonable thing, except that then it has to also change \r to \n.

However, replacing \r\n seems like a meaningless thing to do anyway. See https://docs.perl6.org/language/traps#Splitting_the_Input_Data_Into_Lines for some context.

Long story short, .lines is already splitting the text using all possible variants of a newline. There's no need to mangle the data in slurp.

For example, let's say you want to split the data strictly by \n. One way to do it is to call .lines on a Handle, then it works fine. But if it happens that you must slurp the whole thing, this is what you have to do​:

‘foo’.IO.slurp(​:bin).decode.split(“\n”)

Notice how we use :bin to prevent it from doing the decoding, only to call .decode later anyway. All that just because .slurp itself is doing unnecessary data mangling.

How to resolve this ticket​: Option 1​: stop doing what we do now. For example, don't do \r\n → \n substitution if it's under “use v6.d” Option 2​: include \r and make sure it acts exactly like .lines.join(“\n”)

p6rt commented 7 years ago

From @zoffixznet

Some discussion from IRC​: https://irclog.perlgeek.de/perl6-dev/2017-09-25#i_15212848

p6rt commented 7 years ago

From @AlexDaniel

I'm shocked to see \r\n→\n translation being defended without any reasoning. Why are we doing it at all?

“\r\n => \n transformation is done consistently” consistently what? And why? Even .IO.lines splitting on \r\n is wrong because it cuts \r that may be part of the data (which is fine in the middle of a line, but not on the end).

On 2017-09-25 05​:14​:26, cpan@​zoffix.com wrote​:

Some discussion from IRC​: https://irclog.perlgeek.de/perl6-dev/2017- 09-25#i_15212848

p6rt commented 7 years ago

From @zoffixznet

On Thu, 28 Sep 2017 13​:56​:49 -0700, alex.jakimenko@​gmail.com wrote​:

I'm shocked to see \r\n→\n translation being defended without any reasoning.

Well, it's just a pretty common thing to do. Perl did it for ages. And IIRC we already had tickets on this because Proc​::Async didn't do it.

Why are we doing it at all?

So you could write code that works the same on Window and Linux.

“\r\n => \n transformation is done consistently” consistently what?

Proc​::Async, IO​::Handle, Str.encode, and custom decoders do this consistently. The code the OP mentions just does it in IO​::Path because it's a fastpath and the `​:translate-nl` option is (probably?) not fully exposed to users yet.

Basically, slurp is changing \r\n to \n. This may seem like a reasonable thing, except that then it has to also change \r to \n.

AFAIK we don't support any platform that uses `\r` as line separator character. I don't know why Str.lines cuts on lone `\r`

Long story short, .lines is already splitting the text using all possible variants of a newline. There's no need to mangle the data in slurp.

This isn't about making input `.lines` friendly tho.

p6rt commented 7 years ago

The RT System itself - Status changed from 'new' to 'open'