editorconfig / editorconfig

EditorConfig universal issue tracker and wiki
http://editorconfig.org
3.16k stars 103 forks source link

Option: Trim trailing whitespace #60

Closed sindresorhus closed 12 years ago

sindresorhus commented 12 years ago

trim_trailing_whitespace set to true would trim all trailing whitespace, false would just do nothing.

It could be as simple as this regex [ \t]+$

timkelty commented 12 years ago

This would be very useful to me. I always have this set to true, but when I have to edit/hack third party code in an app, I don't want to have to commit all the whitespace corrections, I only want changed what I intentionally change. Perfect spot for editorconfig!

xuhdev commented 12 years ago

This has been on the list of v0.10.0

xuhdev commented 12 years ago

So just be patient :)

timkelty commented 12 years ago

Ok, thanks.

sun commented 12 years ago

I'd strongly suggest to prefix this option with whitespace_, since there are many more possible and related options; see .gitconfig's diff.whitespace for inspiration:

  • blank-at-eol treats trailing whitespaces at the end of the line as an error.
  • space-before-tab treats a space character that appears immediately before a tab character in the initial indent part of the line as an error.
  • indent-with-non-tab treats a line that is indented with 8 or more space characters as an error.
  • tab-in-indent treats a tab character in the initial indent part of the line as an error.
  • blank-at-eof treats blank lines added at the end of file as an error.
  • cr-at-eol treats a carriage-return at the end of line as part of the line terminator, i.e. with it, trailing-space does not trigger if the character before such a carriage-return is not a whitespace.
  • tabwidth=<n> tells how many character positions a tab occupies; this is relevant for indent-with-non-tab and when git fixes tab-in-indent errors. The default tab width is 8. Allowed values are 1 to 63.
treyhunner commented 12 years ago

Thank you for pointing out the naming scheme gitconfig uses for these options. Comparing equivalent properties in editing and code linting systems (and noting where there are no equivalents) is helpful.

EditorConfig attempts to be more proactive (formatting files before errors occur) whereas gitconfig is more reactive (noting existing errors, but not fixing them) and this difference can be seen by the naming conventions used by each. For this reason I think we should continue to use naming conventions that reflect this assumed difference in behavior.

Difference between the proactive and reactive naming conventions:

I don't really understand what the cr-at-eol option does but I don't believe it has an EditorConfig equivalent.

sun commented 12 years ago

Sorry for not clarifying my point; I wasn't my intention to compare these option names literally.

The only intention was to point out that

  1. there's a plethora of possible whitespace-related options that could be supported by EditorConfig at some point
  2. the wiki contains a long list of non-whitespace-related proposals already
  3. so better prefix all whitespace-related options with whitespace_ in order to retain some level of sanity in option naming.

In essence, preparing the .editorconfig properties for the long-term future, in which options for a diversity of different topics will exist.

For example:

file_byte_order_mark = false
file_charset = utf-8
whitespace_line_indent_style = space
whitespace_line_allow_tabs = false
whitespace_line_tab_width = 2
whitespace_line_trim_trailing = true
whitespace_file_require_newline = true

It would be even better to use dots/periods for separating the sections from sub-sections and from option names; e.g.:

whitespace.line.trim_trailing = true
whitespace.file.require_newline = true
# so essentially:
<section>[.<subject>].<option>

but I'm not sure whether the INI-style format of .editorconfig supports that.

I perfectly realize that this goes beyond the scope of this issue, but after encountering this proposal and looking at the wiki, I wanted to raise the red flag of a constantly increasing set of options that don't follow any kind of grouping or sectioning in their names.

treyhunner commented 12 years ago

I like the idea of logically grouping options, but I don't like the idea of lengthening option names in order to do this.

I think we might be able to solve the problem of option names getting out of control via a style recommendation such as one of these comment/spacing styles:

[*.py]

; encoding
charset = utf-8
end_of_line = LF

; whitespace
indent_style = space
indent_size = 4
tab_width = 8
insert_final_newline = true
trim_trailing_whitespace = true

Or maybe:

[*.py]

; Encoding Options

charset = utf-8
end_of_line = LF

; Whitespace Options

indent_style = space
indent_size = 4
tab_width = 8
insert_final_newline = true
trim_trailing_whitespace = true

The reason I don't want to prefix options with whitespace_ is that I want these option names to be as short as possible while still being verbose enough to be easily understood. The last two options in the examples above are exceptions in that they are fairly long, but we only made them this long because it seemed necessary to correctly convey their meanings.

Do you think this above stylistic suggestion(s) would be an adequate solution to the problem of option grouping?

sun commented 12 years ago

Do you think this above stylistic suggestion(s) would be an adequate solution to the problem of option grouping?

Not really. ;) Comments are only annotating data structures. The moment you remove them, only the data structure remains.

I really like and appreciate the idea and goals of EditorConfig, and believe that the project has the potential for a prospering future. Essentially, you're attempting to abstract and define coding standards in a semantic and standardized way (but without standardizing the same for all), so they're ready for consumption by whichever editor of anyone's choice, and for whichever project someone wants to code for. As one of the main contributors and maintainers of Drupal's extensive coding standards, I'd really like to see this idea to take off. :)

At the same time, if the project will be really successful, then there will be many more properties in .editorconfig files. (I'd estimate a factor of 6 to 10, compared to the current wiki.) And here's the crux: A lot of people will have to write those .editorconfig files and make sense of all the properties. A lot of editor vendors or plugin authors will have to parse and process all of those properties into native editor config. And as an API/specification consumer, the currently proposed property names look and feel very random/chaotic. :-/ Obviously, the current 3 properties don't deliver the big picture.

In short, I think the project will only have a major impact, if it manages to pull off a solid structure and specification for a generic editor configuration definition. Which might sound complex, but really isn't that hard to set up... just by inspecting the configuration options of typical editors and the coding standards of some major projects will deliver a diversity of options and groups/sections already:

whitespace.line.indent.style = tab|space
whitespace.line.indent.columns = 4
whitespace.line.indent.blank_lines = true
whitespace.line.tab.columns = 8
whitespace.line.remove_trailing = true
whitespace.file.convert_tabs_to_spaces = false
whitespace.file.ensure_final_newline = true
file.encoding = utf-8
file.bom = false
file.eol = lf
file.allow_null_chars = false
comment.allow_c = true
comment.allow_shebang = false
comment.allow_eol = true
lang.quote = single
lang.space.operators = true
lang.space.brackets = none
lang.brace.style = [no definite of all possible styles, should be split into multiple properties]
...

I hope I at least make some sense. ;) What do you think? :)

sun commented 12 years ago

err, a paragraph got lost somehow...:

As visible, these property names do not need to be annotated via comments — they are self-descriptive. And nicely grouped by topic/domain. That allows people to locate a particular property more easily, and also, to possibly ignore entire sections when they are not relevant or cannot be supported by a particular editor.

treyhunner commented 12 years ago

I think we may view things differently in a few key areas. I'm going to try to clarify my concerns and I hope you'll do the same in response.

EditorConfig property count growing drastically

I think EditorConfig should aim to be two things:

  1. A set of core properties that are understood nearly universally for all or most file types
  2. A file that can be extended with custom domain-specific or project-specific properties that will be agreed upon by the community (and possibly endorsed by the EditorConfig project as well)

I believe the properties that meet the criterion of the first goal will be very narrow. While reading through coding standards guides I realized that beyond some very basic text editor features, many of the other preferences that could be specified are specific to the file type being edited. PHP, Python, Scheme, Haskell, C, and Bash all have very different possible properties. So far the method I have proposed for resolving this problem is to allow .editorconfig files to function like many other configuration files in that they allow unofficial properties and can be extended and used by tools that understand those unofficial properties.

I think the comment. and lang. properties you show above should all fall under the second category (domain-specific). Some of these property groups might require their own plugins to extend the EditorConfig behavior to add PHP support (or other language support). Some of them may not even be enforceable at the editor level, but may require something more like git pre-commit hooks. Those properties may not be editor properties, but there's no reason they can't be stored in an EditorConfig file and used by other tools that can parse them.

So essentially I don't think EditorConfig will grow more than a few more "official" properties beyond what we currently have set for v0.10.0. It's difficult to imagine other properties that are not very dependent upon the file type.

<section>[.<subject>].<option> notation

I don't like this notation for a couple reasons:

  1. It's very long. whitespace.line.indent.style is 28 characters whereas indent_style is only 12. These files don't need to be written frequently, but it means a lot for someone who doesn't have autocompletion enabled for these property names.
  2. It's difficult to read. My brain takes much longer to parse the important part of whitespace.line.indent.style. I don't see this as a benefit for human readability because the hierarchy enforced makes it less natural for readers that understand English (though possibly more natural for machines).

I want .editorconfig files to be human readable/editable first and machine readable for convenience. Being that these are style guides they should be as clear as possible.

Notations for domain-specific properties

I have far less problem with encouraging this behavior with domain-specific properties:

lang.quote_type = single
lang.whitespace_around_operators = true
lang.whitespace_around_brackets = true

Prefixing could be useful for non-standard properties. I still think it's wordy, but it could help separate them visually from the rest of the properties. We could even deeply nest those properties if it's desirable, but I see this as something that should be left up to the sub-communities that will use these properties.

Unfortunately with the domain-specific properties, until we have tools that can read and understand any of them (regardless of what naming convention is used) most people probably won't care what notation is used.

By the way, you guys have done a really good job on the Drupal coding standards guide by the way. It was probably the most well-written style guide I found when looking for inspiration for new properties.

sindresorhus commented 12 years ago

:arrow_up: :+1: on everything.

xuhdev commented 12 years ago

Really cool here. I think that @treyhunner has clearly explained his opinion. Yes, just as @treyhunner said, I think the long prefix is wordy. But I have just one comment, if we really need to categorize the properties, maybe it's better for us to use something like yaml format inside ini format? But things can become a bit complex.