ciaranm / detectindent

Vim script for automatically detecting indent settings
173 stars 48 forks source link

Ignore multiline comment blocks in HTML and JS #12

Open ghostwords opened 12 years ago

ghostwords commented 12 years ago

Anything between should be ignored when determining indentation.

/* */ blocks should also be ignored in JavaScript files.

For example, a file starting with the following contents

<!doctype html>
<!--
 * line starts with a space, several lines like it might follow
-->
<html>
<head>
<script type="text/javascript">
function func() {
  if (true) { // line starts with a tab, and all lines following start with tabs

is incorrectly parsed by Detect Indent:

Detected spaces and tabs; has_leading_tabs: 1, has_leading_spaces: 1, shortest_leading_spaces_run: 1, shortest_leading_spaces_idx: 3, longest_leading_spaces_run: 1 Initial buffer settings changed: tabstop changed from 4 to 2, shiftwidth changed from 4 to 1

I expect tabs since the entire file past the comment block on top uses tabs.

Also, is it necessary to check for filetype (HasCStyleComments) when checking for comment lines? Perhaps filetype checks should be removed. I could have a long multiline JavaScript comment with /* and */ in an html file that could throw off Detect Indent, for example.

ciaranm commented 12 years ago

This one's really fiddly. We've got to be careful not to screw up common cases in order to get particular situations to work -- it's not realistically possible to get every single file right.

Can you produce a candidate patch that we can try against a bunch of files? I'd like to avoid getting anything wrong that is currently detected correctly.

ghostwords commented 12 years ago

Are you referring to my latter comment regarding filetype checks? How about the main issue, ignoring multiline HTML comments?

ciaranm commented 12 years ago

No, even HTML comments are a nuisance. Don't people stick those things after <script> and the like?

ghostwords commented 12 years ago

They could be anywhere on the page, yes. But it's better to ignore them wrongly than to choose incorrect indentation, no?

ciaranm commented 12 years ago

I'm not convinced that ignoring them won't break some things that are currently correctly detected. Script blocks with a comment are likely to contain sensible indentation, currently.

ghostwords commented 12 years ago

Can you give some examples of what ignoring HTML comments might break?

ciaranm commented 12 years ago

I'm thinking files that are mostly just a commented out script block.

The thing with the comment skipping... It shouldn't really be there at all. But for Cish languages, particularly when using Doxygen or Javadoc, you get a lot of things like this at the top of a file:

/**
 * Silly single line indent to make the * line up
 */

I'd really rather reduce the number of special tricks, not increase it... Is what you're doing with HTML really that common?

ghostwords commented 12 years ago

Yes, absolutely.

Every file in several projects I work on contains a comment block at the top. Sometimes the comment block is a comment, and sometimes it is a /* */ comment in a JavaScript file (which actually trips up Detect Indent as well now that I check).

Are files that are mostly commented-out script blocks common in your experience? They are not in mine. Couldn't you have a case for them as well, if this situation needed fixing?

These bugs are deal breakers for me, unfortunately, and I don't know if I have the time to fix them myself in the near future.

ciaranm commented 12 years ago

It's not whether there's a comment block there. It's whether there's a comment block there with bogus indenting. The only time I've seen that happen is with aligned stars for C-like comments.

ghostwords commented 12 years ago

Right, all the comment blocks I deal with here are either

/*!
 * Description paragraph.
 *
 * Copyright line.
 */

in JS or the same but with in HTML.

There is nothing bogus about them.

ciaranm commented 12 years ago

I get it when there are *s there, since that's for alignment. But why the odd indenting for HTML comments?

ghostwords commented 12 years ago

For consistency with the JavaScript header comments, so that the comment format is almost entirely the same. If you look at it critically, the JavaScript commenting situation is a clearer case in need of fixing, but I argue both JS and HTML comments are valid issues with Detect Indent.

ciaranm commented 12 years ago

That seems to be a fairly convoluted set of circumstances that would have to come together to trigger this. Can you find out whether odd indenting for HTML comments is wide-spread, as opposed to just an organisational oddity? I don't do much work with HTML, but I've not seen it in what I have done.

ghostwords commented 12 years ago

C-style comment headers blocks in JavaScript files:

Let's start with most watched/forked JS repository on GitHub, Bootstrap:

https://github.com/twitter/bootstrap/blob/master/js/bootstrap-carousel.js for instance.

ghostwords commented 12 years ago

Unfortunately, "<!--" and "/*" are hard to search for across GitHub source code.

ciaranm commented 12 years ago

Ok, that's a sign we should give JavaScript files the multiline C commenting hack. What about HTML?

ghostwords commented 12 years ago

There is no HTML "language" on GitHub, and code search is terrible, but here are a couple of examples:

https://github.com/cardmagic/lucash/blob/0452d410430d12140c14948f7f583624f819cdad/reference/scsh-0.6.6/doc/src/manual/s48manual.html

https://github.com/theRocket/typo/blob/8664552d86ea39d28fec76bc34d66a16a0d02b60/public/javascripts/fckeditor/editor/dialog/fck_flash/fck_flash_preview.html

majewsky commented 12 years ago

A suggestion: Ignore all whitespace which is inside a comment block. My vimscript is pretty weak, but you could probably cover most cases with something like

synIDattr(synID($LINENUMBER, 0, 1), 'name') =~ /Comment$/

The obvious problem is that this might be very slow. Needs benchmarking, and perhaps an option for the vimrc.

ciaranm commented 12 years ago

I'm definitely not going that route. It's simply too slow, fiddly and unreliable.

It's also wrong, since comments are not in general a problem. The comment heuristics are there to deal with a limited, common set of special cases. If they're to be extended, it's only to cover similar common issues. The goal is not to handle every single oddity.

tnguyen14 commented 7 years ago

I'm running into this issue a lot, where a 2-space indented JS file gets converted to 1-space because of comment blocks. Is there no way to fix that?