dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.13k stars 4.7k forks source link

String.Join Different results on different operating systems Windows/Linux #71990 #98116

Closed EvgenyPrikhodko closed 8 months ago

EvgenyPrikhodko commented 8 months ago

Please register a wish so that r can be ignored in Linux

the problem is described here: https://github.com/dotnet/roslyn/issues/71990

Linux ` StringBuilder str = new StringBuilder();

        str.AppendLine("test1");
        str.AppendLine("test2");
        str.AppendLine("test3");

        var qryColumnsString = string.Join(", ", str.ToString().Remove(str.Length - 2).Split("\n"));

` Windows

` StringBuilder str = new StringBuilder();

        str.AppendLine("test1");
        str.AppendLine("test2");
        str.AppendLine("test3");

        var qryColumnsString = string.Join(", ", str.ToString().Remove(str.Length - 2).Split("\r\n"));

`

Clockwork-Muse commented 8 months ago

String.Join() isn't the culprit, it's StringBuilder.AppendLine() that has the platform-specific behavior. All your downstream problems stem from that.

The "traditional" way to solve this is to use one of the existing components to iterate over lines for you, abstracting the differences automatically, possibly just StringReader. Another way to do it with your existing code is to modify your split:

StringBuilder str = new StringBuilder();

str.AppendLine("test1");
str.AppendLine("test2");
str.AppendLine("test3");

var qryColumnsString = string.Join(", ", str.ToString().Split('\r', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries));
Console.WriteLine(qryColumnsString);

.... however, your code is implying that you're doing dynamic SQL in some form, and I'm concerned.... In particular, there shouldn't be a reason to have multiple lines.

EvgenyPrikhodko commented 8 months ago

@Clockwork-Muse Thanks for the feedback, there are a lot of different projects that are switching to cross-platform, maybe there is a way to solve such problems easier?

EvgenyPrikhodko commented 8 months ago

@Clockwork-Muse

Linux image

EvgenyPrikhodko commented 8 months ago

but we want it like this image

vcsjones commented 8 months ago

Why are you using StringBuilder to append lines only to immediately remove them?

Why not just do something like string.Join(", ", new string[] {"test1", "test2", "test3"})?

If you want to use a specific line ending, then use a specific line ending instead of AppendLine.

EvgenyPrikhodko commented 8 months ago

@vcsjones Thanks for the answer, we are now abandoning Stringbuilder, but we want to understand why Stringbuilder does not understand which OS?

vcsjones commented 8 months ago

, but we want to understand why Stringbuilder does not understand which OS?

StringBuilder does understand your operating system, which seems to be what you do not want.

AppendLine is this:

https://github.com/dotnet/runtime/blob/5501afdf51a51781c0a161b05d6e1417ce4607e3/src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs#L863-L867

Environment.NewLine is going to be \n on Linux / macOS, and \r\n on Windows.

When you were using string.Split, it was not accounting for the fact that Environment.NewLine is different between operating systems. You were just splitting by \n, which left the \r in for Windows.

EvgenyPrikhodko commented 8 months ago

@vcsjones Thanks for the answer, that's why I ask that r be ignored in Linux*

vcsjones commented 8 months ago

@vcsjones Thanks for the answer, that's why I ask that r be ignored in Linux*

Non-Windows operating systems by convention does not use \r\n for line endings. Just \n. Environment.NewLine follows that convention for Linux and macOS.

EvgenyPrikhodko commented 8 months ago

@vcsjones Thanks for the answer, then why doesn't the compiler give a warning?

MichalPetryka commented 8 months ago

String.Join() isn't the culprit, it's StringBuilder.AppendLine() that has the platform-specific behavior. All your downstream problems stem from that.

The "traditional" way to solve this is to use one of the existing components to iterate over lines for you, abstracting the differences automatically, possibly just StringReader. Another way to do it with your existing code is to modify your split:

StringBuilder str = new StringBuilder();

str.AppendLine("test1");
str.AppendLine("test2");
str.AppendLine("test3");

var qryColumnsString = string.Join(", ", str.ToString().Split('\r', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries));
Console.WriteLine(qryColumnsString);

.... however, your code is implying that you're doing dynamic SQL in some form, and I'm concerned.... In particular, there shouldn't be a reason to have multiple lines.

Shouldn't it be '\n' there?

CyrusNajmabadi commented 8 months ago

Thanks for the answer, then why doesn't the compiler give a warning?

What warning woudl be given here. There is nothing wrong with the code as written. Users may absolutely want to split the string builder on something like \r\n even if on linux.

EvgenyPrikhodko commented 8 months ago

@CyrusNajmabadi For example, migrating a large project, since r is only for Windows, it is at least correct to give a warning to the programmer.

CyrusNajmabadi commented 8 months ago

since r is only for Windows

This is not true. It's totally normal for a tool running on another OS to process files containing \r\n in them.

it is at least correct to give a warning to the programmer.

Definitely not :)

It isn't relevant what platform a user is on when compiling. Their program may run on a different system and it also may consume data from another system. That is totally normal and expected.

EvgenyPrikhodko commented 8 months ago

@CyrusNajmabadi Thank you for the answer, do I understand correctly that you think that if the code in different operating systems gives different results, is it normal for you?

CyrusNajmabadi commented 8 months ago

do I understand correctly that you think that if the code in different operating systems gives different results, is it normal for you?

Yes. If you do not want that, write the newlines out yourself. Otherwise, things like AppendLine will make a determination for you. Similarly, when you are making calls to things like ToString, you may have to consider if locale shoudl be considered or not. By default, the apis are setup to take into account the platform they are running on (which is unrelated to where it is built). If you don't want those automatic behaviors, you must supply the explicit values yourself.

EvgenyPrikhodko commented 8 months ago

@CyrusNajmabadi thank you for your answer, where is the cross-platform here then?

CyrusNajmabadi commented 8 months ago

where is the cross-platform here then?

i'm not sure what you're asking :) . Could you clarify the question?

EvgenyPrikhodko commented 8 months ago

where is the cross-platform here then?

i'm not sure what you're asking :) . Could you clarify the question?

Take a large project, imagine that there are many different StringBuilders, where is \r, your actions? How much time will you spend analyzing the correctness of the answers?

CyrusNajmabadi commented 8 months ago

where is \r, your actions?

I don't know what this means. Could you clarify.

How much time will you spend analyzing the correctness of the answers?

I work in projects where this is the norm. We just test and validate our code on different platforms. It's not burdensome.

EvgenyPrikhodko commented 8 months ago

where is \r, your actions?

I don't know what this means. Could you clarify.

How much time will you spend analyzing the correctness of the answers?

I work in projects where this is the norm. We just test and validate our code on different platforms. It's not burdensome.

@CyrusNajmabadi Let's put it another way, you were given a project, your task is to migrate to NET5+, you do not have detailed information about what methods should be used and how to return, Your actions?

CyrusNajmabadi commented 8 months ago

@EvgenyPrikhodko afaict, nothing has changed here wrt to how these work from prior .net versions to now. So i would expect that i would simply recompile the above code and find that it worked the same way.

EvgenyPrikhodko commented 8 months ago

@EvgenyPrikhodko afaict, nothing has changed here wrt to how these work from prior .net versions to now. So i would expect that i would simply recompile the above code and find that it worked the same way.

It's bad, since there is a business layer, you may just not know the subtleties.

CyrusNajmabadi commented 8 months ago

@EvgenyPrikhodko I don't really understand what you are saying. The logic of the above has been how .net has worked for a very long time. Migrating to .net 5.0 shouldn't change anything.

It's bad, since there is a business layer, you may just not know the subtleties.

I don't know what that has to do with how .AppendLine works. Ultimately, if you are going to make changes to software, you will need to understand it, understand what it is calling into, understand the implications of the change, and test it to validate it. All of those (except understanding what this API does) is external to this issue.

Note, our docs ar every clear on the behavior here: https://learn.microsoft.com/en-us/dotnet/api/system.text.stringbuilder.appendline?view=net-8.0

Appends the default line terminator, or a copy of a specified string and the default line terminator, to the end of this instance. The default line terminator is the current value of the Environment.NewLine property.

This behavior is intentional and documented. If it's something you don't like, you could always write an analyzer to detect this in your code base and inform you of the potential problems. It's possible someone has already written such an analyzer.

EvgenyPrikhodko commented 8 months ago

@EvgenyPrikhodko I don't really understand what you are saying. The logic of the above has been how .net has worked for a very long time. Migrating to .net 5.0 shouldn't change anything.

It's bad, since there is a business layer, you may just not know the subtleties.

I don't know what that has to do with how .AppendLine works. Ultimately, if you are going to make changes to software, you will need to understand it, understand what it is calling into, understand the implications of the change, and test it to validate it. All of those (except understanding what this API does) is external to this issue.

Note, our docs ar every clear on the behavior here: https://learn.microsoft.com/en-us/dotnet/api/system.text.stringbuilder.appendline?view=net-8.0

Appends the default line terminator, or a copy of a specified string and the default line terminator, to the end of this instance. The default line terminator is the current value of the Environment.NewLine property.

This behavior is intentional and documented. If it's something you don't like, you could always write an analyzer to detect this in your code base and inform you of the potential problems. It's possible someone has already written such an analyzer.

If we're talking about cross-platform, we're saying that the code works the same way on windows and linux

CyrusNajmabadi commented 8 months ago

we're saying that the code works the same way on windows and linux

No one is saying that. The code explicitly says that it does not behave the same across platforms and that it is explicitly impacted by the Environment it runs in.

If you want your code to run exactly the same in all locations, and not be affected by things like Locales/Environments, you will have to avoid certain apis. You could either write an analyzer to block these apis. Or you could look to see if someone else has done such a thing.

It is not a goal that, by default, you get exactly identical behavior (independent of environment, locale, timezone, etc.) on every platform.

EvgenyPrikhodko commented 8 months ago

Please pay attention to the problem, it is very expensive to look for problems.

CyrusNajmabadi commented 8 months ago

@EvgenyPrikhodko

it is very expensive to look for problems.

Ok. :)

EvgenyPrikhodko commented 8 months ago

@EvgenyPrikhodko

it is very expensive to look for problems.

Ok. :)

I'm sorry, but I think you don't see the very essence of the problem

CyrusNajmabadi commented 8 months ago

I'm sorry, but I think you don't see the very essence of the problem

I'm saying that the problem you are running into is something you will have to take care of on your own (potentially using analyzers that others have written). That may be expensive. C'est la vie.

Currently, this issue is describing the expected, and documented behavior of these APIs. Millions of devs, billions of lines of code, and tons of companies have apps written using these apis expecting them to work in this fashion.

If that behavior is not ok for you, it's on you to determine what should be done for your own applications.

EvgenyPrikhodko commented 8 months ago

I'm saying that the problem you are running into is something you will have to take care of on your own (potentially using analyzers that others have written). That may be expensive. C'est la vie.

Currently, this issue is describing the expected, and documented behavior of these APIs. Millions of devs, billions of lines of code, and tons of companies have apps written using these apis expecting them to work in this fashion.

If that behavior is not ok for you, it's on you to determine what should be done for your own applications.

Thank you for your answer, unfortunately I cannot agree with you.

CyrusNajmabadi commented 8 months ago

Thank you for your answer, unfortunately I cannot agree with you.

Which part do you not agree with?

EvgenyPrikhodko commented 8 months ago

Which part do you not agree with?

This does not work in large or complex projects.

CyrusNajmabadi commented 8 months ago

This does not work in large or complex projects.

The existence of many successful large and complex projects using these very apis seems to provide a counterpoint to that :)

Indeed, given the many thousands of companies and millions of devs, writing large/complex projects on .net, it seems like this default is acceptable. Again, it may not be acceptable to you. But there are mechanisms already stated for how you can deal with this.

Furthermore, changing this in any way would certainly cause massive breaking changes everywhere.

martincostello commented 8 months ago

You could always write yourself an extension method to do exactly what you're asking for and just exclusively use that:

public static StringBuilder AppendLineWindows(this StringBuilder value)
{
    return value.Append("\r\n");
}

Future readers of the code who are familiar with the existing documented behaviours of the AppendLine() method will then understand that the code is very intentionally using Windows-specific line endings.

EvgenyPrikhodko commented 8 months ago

This does not work in large or complex projects.

The existence of many successful large and complex projects using these very apis seems to provide a counterpoint to that :)

Indeed, given the many thousands of companies and millions of devs, writing large/complex projects on .net, it seems like this default is acceptable. Again, it may not be acceptable to you. But there are mechanisms already stated for how you can deal with this.

Furthermore, changing this in any way would certainly cause massive breaking changes everywhere.

Thanks for the feedback, there are many different approaches and solutions, for example, we just abandoned StrimBuilder, and we have to look for code and replace it.

CyrusNajmabadi commented 8 months ago

for example, we just abandoned StrimBuilder, and we have to look for code and replace it.

Sounds good. Glad you were able to come up with a solution that works for your needs.

EvgenyPrikhodko commented 8 months ago

You could always write yourself an extension method to do exactly what you're asking for and just exclusively use that:

public static StringBuilder AppendLineWindows(this StringBuilder value)
{
    return value.Append("\r\n");
}

Future readers of the code who are familiar with the existing documented behaviours of the AppendLine() method will then understand that the code is very intentionally using Windows-specific line endings.

Thanks for the feedback, you can't guess everything, but cross-platform should be embedded in Net, who's against it?

CyrusNajmabadi commented 8 months ago

but cross-platform should be embedded in Net,

.Net is cross platform. It's just that some apis operate in a fashion where they customize their behavior based on things like locale, environment, etc.

If you want identical behavior, you can just not use those apis. And you can either write an analyzer to prevent usage of those apis, or see if someone has already done the same.

Again, solutions for what you want is already possible today. You just have to choose to use them.

EvgenyPrikhodko commented 8 months ago

@CyrusNajmabadi We're getting back to the point again, what's your question? how do you plan to emigrate the code?

tannergooding commented 8 months ago

Cross platform does not mean deterministic across platforms, it simply means that code can run on multiple platforms.

Different hardware, different archtectures, different platforms are different and that is often "by design", it is a fundamental of programming.

While we do strive to ensure cross platform determinism in many cases, there are many other places where this is literally impossible, too limiting, or where it is completely undesirable because it is more functionally breaking to deviate from the platform norms than it is to simply say "this is a case where platforms can differ in behavior".

Newlines, File IO, and a few other areas are common places where languages (including those outside of .NET) choose to explicitly differ across platforms.


That all being said...

This is an area where it might be desirable to expose an analyzer telling users that using .Split('\n') or .Split("\r\n") is undesirable and that using .Split(Environment.NewLine) is the better option. I would recommend opening a proposal specifically asking for such support.

Alternatively, we have prior art in several BCL APIs to allow the user to override what "newline" is used (such as System.IO.TextWriter.NewLine), it may be desirable to expose such support on StringBuilder as well, which would help simplify cases where users do want to deviate from the platform default. In such a case, it may even be desirable to expose a SplitLines API to handle this extremely common operation.

martincostello commented 8 months ago

Certain APIs are documented to behave in OS-specific ways, and StringBuilder.AppendLine() is one of those such APIs.

By the "all APIs should behave the same on all platforms" standpoint you suggest, the OperatingSystem.IsWindows() method would return the same answer on Linux, macOS and Windows, which would certainly be incorrect behaviour.

ghost commented 8 months ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

Issue Details
Please register a wish so that `r` can be ignored in Linux the problem is described here: https://github.com/dotnet/roslyn/issues/71990 Linux ` StringBuilder str = new StringBuilder(); str.AppendLine("test1"); str.AppendLine("test2"); str.AppendLine("test3"); var qryColumnsString = string.Join(", ", str.ToString().Remove(str.Length - 2).Split("\n")); ` Windows ` StringBuilder str = new StringBuilder(); str.AppendLine("test1"); str.AppendLine("test2"); str.AppendLine("test3"); var qryColumnsString = string.Join(", ", str.ToString().Remove(str.Length - 2).Split("\r\n")); `
Author: EvgenyPrikhodko
Assignees: -
Labels: `question`, `area-System.Runtime`
Milestone: -
EvgenyPrikhodko commented 8 months ago

Cross platform does not mean deterministic across platforms, it simply means that code can run on multiple platforms.

Different hardware, different archtectures, different platforms are different and that is often "by design", it is a fundamental of programming.

While we do strive to ensure cross platform determinism in many cases, there are many other places where this is literally impossible, too limiting, or where it is completely undesirable because it is more functionally breaking to deviate from the platform norms than it is to simply say "this is a case where platforms can differ in behavior".

Newlines, File IO, and a few other areas are common places where languages choose to explicitly differ across platforms.

That all being said...

This is an area where it might be desirable to expose an analyzer telling users that using .Split('\n') or.Split("\r\n")is undesirable and that using.Split(Environment.NewLine)` is the better option. I would recommend opening a proposal specifically asking for such support.

Alternatively, we have prior art in several BCL APIs to allow the user to override what "newline" is used (such as System.IO.TextWriter.NewLine), it may be desirable to expose such support on StringBuilder as well, which would help simplify cases where users do want to deviate from the platform default. In such a case, it may even be desirable to expose a SplitLines API to handle this extremely common operation.

Yes, you are right from the point of view of an OS programmer, yes. But what should we do when we transfer the code?

EvgenyPrikhodko commented 8 months ago

What's the problem with giving a warning?

CyrusNajmabadi commented 8 months ago

how do you plan to emigrate the code?

Read the docs. Write the code according to the provided documentation for all these APIs.

What's the problem with giving a warning?

There is nothing wrong with how that code is running. It is operating in an expected and documented fashion, and existing code is very much depending on those semantics. Warnings on code break people.

If you want warnings, you can have them yourself by either writing your own analyzer for your own domain-specific concerns, or trying to find an existing analyzer written by someone other group whose interests align with yours.

tannergooding commented 8 months ago

What's the problem with giving a warning?

That is why I suggested opening a proposal for an analyzer in the second half of my statement:

This is an area where it might be desirable to expose an analyzer telling users that using .Split('\n') or .Split("\r\n") is undesirable and that using .Split(Environment.NewLine) is the better option. I would recommend opening a proposal specifically asking for such support.

This is a decently common pitfall and so an analyzer, could potentially help point users in the right direction. There is a downside, as Cyrus indicated, in that it would create noise for users who are doing it intentionally and where they'd then need to disable the analyzer.

I then gave, as an additional option opening a proposal to make it easier for users who want to customize the newline behavior intentionally.

Yes, you are right from the point of view of an OS programmer, yes.

Not as an OS programmer, but as any kind of programmer. This is a fundamental truth for almost every programming language out there.

EvgenyPrikhodko commented 8 months ago

What's the problem with giving a warning?

That is why I suggested opening a proposal for an analyzer in the second half of my statement:

This is an area where it might be desirable to expose an analyzer telling users that using .Split('\n') or .Split("\r\n") is undesirable and that using .Split(Environment.NewLine) is the better option. I would recommend opening a proposal specifically asking for such support.

This is a decently common pitfall and so an analyzer, could potentially help point users in the right direction. There is a downside, as Cyrus indicated, in that it would create noise for users who are doing it intentionally and where they'd then need to disable the analyzer.

I then gave, as an additional option opening a proposal to make it easier for users who want to customize the newline behavior intentionally.

Yes, you are right from the point of view of an OS programmer, yes.

Not as an OS programmer, but as any kind of programmer. This is a fundamental truth for almost every programming language out there.

Thanks for the feedback, I don't see a problem to give a warning so that the programmer decides for himself

tannergooding commented 8 months ago

Giving a warning is what an analyzer would do, hence why I recommended opening an API proposal asking for that feature if that interests you.

We have a particular template that is recommended to be followed for such proposals: https://github.com/dotnet/runtime/issues/new?assignees=&labels=api-suggestion&projects=&template=02_api_proposal.yml&title=%5BAPI+Proposal%5D%3A+

EvgenyPrikhodko commented 8 months ago

Giving a warning is what an analyzer would do, hence why I recommended opening an API proposal asking for that feature if that interests you.

Thank you for your opinion, do I understand correctly that you want to open a new issue?