Explain UTF-8 BOM rule in readme

mk-pmb commented 7 years ago

Most style decisions are explained in the readme, but I couldn't find the reasoning on why a BOM is considered bad. Where I've looked:

the patch that enabled it, found no commit message body.
the current code of the rules file, https://github.com/airbnb/javascript/blob/8cf2c70a4164ba2dad9a79e7ac9021d32a406487/packages/eslint-config-airbnb-base/rules/style.js#L472-L474
the eslint rule page mentioned in the patch code comment, doesn't claim it's bad.
searched the readme for "BOM", "unicode" and "byte order"
searched issue tracker for "BOM", "unicode" and "byte order"

Could someone explain it, or add search keywords to make the explanation easier to find?

Update: Also, is there a recommendation on how to declare the file encoding instead? I searched the readme for "charset", "encod" and "character set" but no matches.

ljharb commented 7 years ago

The rule page says "UTF-8 does not require a BOM because byte ordering does not matter when characters are a single byte. Since UTF-8 is the dominant encoding of the web, we make "never" the default option." - the reasoning for choosing "never" is because files should always only be in UTF-8.

Are you running into issues with this rule?

mk-pmb commented 7 years ago

Since UTF-8 is the dominant encoding of the web, we make "never" the default option.

Yes, that's what I found there as well. It's ok as a default for eslint. I thought the patch repeats this because there had been a stronger reason for "never". In my projects I like BOMs because Firefox will trust the BOM more than it trusts the Content-Type header, thus it protects my files' encoding when used on webspace that announces another charset (e.g. legacy website). It's also helpful on webspace that doesn't send any charset info (e.g. python -m SimpleHTTPServer), or not even any Content-Type (file:// access).

To reproduce, run Python 2.7.6 and Firefox 57 on Ubuntu trusty with system locale en_US.UTF-8. Python's SimpleHTTPServer sends just "text/plain" without a charset, probably because it doesn't consider itself authoritative to guess more details. Both this way and when loading via file://, without a BOM, Firefox guesses "windows-1252" and consequently garbles umlauts and emoji.

ljharb commented 7 years ago

In general, this config repeats all the defaults explicitly.

I'd say that your solution should be to use a build process to auto-insert the BOM for you, rather than encoding that directly in the file.

Separately, lots of web features are broken on file://, so it shouldn't be used for any reason ever anyways.

mk-pmb commented 7 years ago

your solution should be to use a build process

I could imagine an argument for separation of concerns: The transport and compatibility issues should be solved by some other mechanism because the code files should only be concerned with behavior. Are there other reasons to suggest a build process in general, independent of project details?

ljharb commented 7 years ago

Modern web dev requires a build process anyways (for minification, babel, etc) - it has for years, and it will for the foreseeable future.

mk-pmb commented 7 years ago

I think those are reasons worthy to be mentioned in the style guide. How about this? "Your projects should use a build process so you can easily plug in a linter, transpiler, minification etc. Dealing with encoding issues in the source files (e.g. UTF-8 BOM to indicate Unicode) thus is a code smell for a lack of tooling."

Update: Changed the "ing"s to "er"s to match the search keywords.

ljharb commented 7 years ago

I guess that's fine; this isn't something that almost anybody runs into because almost everyone uses tools that assume UTF-8. Want to send a PR?

mk-pmb commented 7 years ago

ok, PR coming up later.

galvarez421 commented 5 years ago

For whatever it's worth, this rule does make it harder to work with Visual Studio default file saving behavior. See:

ljharb commented 5 years ago

Your SO link contains a link to a vscode extension that fixes the vscode bug.

galvarez421 commented 5 years ago

In my case, it has been easier to disable the rule so that other developers working on the projects using the config don't have to install an extension or otherwise configure things specifically to satisfy the rule (granted, there may be reasons to keep the rule enabled, but I haven't run into them). I mostly mentioned the Visual Studio case in response to your question to the OP ("Are you running into issues with this rule?") and in case it's considered worth consideration, given the popularity of Visual Studio.

I agree with @mk-pmb's suggestion that the documentation should explain why exactly BOM is disallowed. Given that the only options for the rule are "always" or "never", it's clear why the default is "never", given your explanation and the explanation in the rule page. However, I don't think it's clear why the Airbnb config enables the rule as an error. The rule page says that "UTF-8 does not require a BOM" but it's not clear why that should translate to the BOM being disallowed.

mk-pmb commented 5 years ago

See my PR #1643 for potential reasons.

airbnb / javascript

Explain UTF-8 BOM rule in readme #1640