NETMF / netmf-interpreter

.NET Micro Framework Interpreter
http://netmf.github.io/netmf-interpreter/
Other
487 stars 224 forks source link

Normalize enconding of source files #489

Open josesimoes opened 8 years ago

josesimoes commented 8 years ago

It seems that the files in the repo don't have a consistent encoding. This causes problems, for example, when submitting PRs because GitHub finds differences where there are no real differences rather chars that are different (or don't have correspondence) from one code page to another. Suggest this gets normalized and a note about it be added to the code standards.

smaillet-ms commented 8 years ago

This will definitely be a part of the next release which is taking a "clean slate" approach (see the newly created orphaned branch. I'm a bit uneasy with the idea of changing encodings etc.. in the current branches though. The last time we tried something like that for normalizing line endings in the Llilum project all hell broke loose.

josesimoes commented 8 years ago

@smaillet-ms fine with me. Just saying. You were the one point out this situation yesterday. I was the one to blame because my editor, for some reason, choose to save the file on UTF-8 encoding.

When going through the code everything looks OK until you submit the PR and see the diff in GitHub. Then one needs to go and change the encoding of that file. It's not efficient and can be rather tedious...

smaillet-ms commented 8 years ago

Yes, not really blaming you. You got victimized here. Given that I want to treat the next version with a clean slate that we bring code over piece by piece and make it conform to newer designs and style guides - the encoding can be standardized at that point fairly readily. Manually editing them all just for this seems a bit tedious and not very helpful in the end.

As to what encoding we should use it's going to depend on the file type as not all language tools understand the various encoding types (i.e. I can't remember if C++ officially acknowledges the encoding of source files or if it leaves that as an implementation detail where some compilers might not recognize a UTF BOM and would fail to compile code...)

miloush commented 8 years ago

Well, at the end the standard does not really matter, does it? (in the sense that if any of the tools does not follow it, having it mentioned in the standard does not help fixing the problem)

I don't know either, quick search brings e.g. this opinion that it's an implementation detail, although most "modern" C++ compilers can do it. I am more worried about the embedded 3rd party tools.

Also it should be probably mentioned that supporting UTF and recognizing BOM are two different things. UTF encoding without BOM still constitutes valid data in all 8-bit code pages, but not vice versa.

I think the problem gets only worse as people have different default code pages. It is impossible to detect which one was intended and I can't imagine people would keep saving (and opening) files in the selected encoding, especially in Visual Studio.

So I would think the order of preference should be:

The current state - having © in the files relies on implementation details of the tools anyway.

miloush commented 8 years ago

Btw. quick check of cpp, c, h and asm files in the repo:

Files with BOM: 0 A9 ©: 6

B5 µ: 4

F6 ö: 2

92 ʼ: 1

EC μ: 1 (not 1252)

In addition to that,