JacobNinja / exercism-analysis

2 stars 2 forks source link

Defining inconsistent indentation #2

Closed kytrinyx closed 10 years ago

kytrinyx commented 10 years ago

I've been looking at the exercises that triggered the rule "inconsistent indentation". Out of 1500 samples analyzed, over 700 triggered the rule.

It seems like there are a few things that could trigger it that probably shouldn't. (I don't actually know what triggers it, I'm just making some guesses here.

private - There are three more-or-less common (acceptable) indentation strategies for the private keyword:

#1 - class/private/end aligned
class Thing
  def stuff
    # ...
  end

private

  def mine
    # ...
  end
end
#2 - class/end + def/end/private/def/end 
class Thing
  def stuff
    # ...
  end

  private

  def mine
    # ...
  end
end
#3 - class/end + def/end/private + def/end 
class Thing
  def stuff
    # ...
  end

  private

    def mine
      # ...
    end
end

trailing whitespace on blank lines - While I hate trailing whitespace, it's not obviously inconsistent.

chaining methods - This should probably be allowed, but I have no idea if it's possible to detect it.

    def count_differences(seqs)
      short, long = *seqs
      short.chars.zip(long.chars)
        .map{ |char1, char2| char1 == char2 ? 0 : 1 }
        .reduce(&:+)
    end

... and by "fine" I'm totally not talking about the ternary nested inside the map :^)

macros - well, not really technically macros, but things like attr_reader declarations and stuff like module_function, extend Thingy, include Whatever. E.g.

module Hamming

  module_function

  def compute(first_strand, second_strand)
    length = [first_strand, second_strand].map(&:size).min
    pairs = first_strand.each_char.zip(second_strand.each_char).take(length)

    pairs.count { |a, b| a != b }
  end
end

mystery - some look perfectly good to me, but still trigger the inconsistent indentation rule. E.g.

class Hamming 
  def self.compute(s1, s2)
    iterations = s1.length < s2.length ? s1.length : s2.length

    hd = 0
    for i in 0...iterations
      hd += 1 if s1[i] != s2[i]
    end

    hd
  end
end

When you've got a moment, let's talk about the criteria for this rule.

JacobNinja commented 10 years ago

I think most of the false positives you are seeing are due to lines containing just whitespace (another edge case not tested). Fixed in ccbf99d32de490c2986a642c4ab8a4845a6510fc

This analyzer works a bit different than the rest. Instead of parsing the code and using the AST for analysis, it splits the code on line breaks and counts the amount of leading whitespace for each line. The rule for deciding whether or not the whitespace is inconsistent can be found here: https://github.com/JacobNinja/exercism-analysis/blob/master/lib/exercism-analysis/analyzers/ruby/indentation.rb#L31

Basically the analyzer will trip if the difference in whitespace between a line and the next (non whitespace or empty) line is equal to 1 or greater than 2. I think that covers the majority of cases except this one:

foo bar,
      baz

There might be something I can do with the AST to identify whether the inconsistent spacing is desired or not. I'll look into it as soon as I have time.

Does that seem to fit your expectations? If not, we can definitely change the behavior of this.

kytrinyx commented 10 years ago

I think this is close to the expected behavior, but there might be a lot of edge cases.

Basically I think that def and end should always match up, as should if, else, elsif, and end, and similar places where we have open/close.

So, this should be considered inconsistent:

if something
  # whatever
  else
  # whatever
  end

If this becomes too difficult (because we can't use the AST), then I think we should probably just define some rules for things that are clearly weird and respond to it, but leave anything that might be subtle alone.

JacobNinja commented 10 years ago

if example: 576ba6a3ac51aebaf17754e73c28755442af8be4

The logic for inconsistent spacing is actually pretty simple if we target specific nodes. There are a lot of these, so I'll try to churn through them all in the next few days. Here's the list of keywords from Codeminer that I'll start with: do class def if else unless begin rescue case when elsif until while for ensure

If you can think of any more examples (or cases that don't have to do with a keyword, like method call arguments), post them here and I'll get them implemented.

Cheers!

kytrinyx commented 10 years ago

Oh, yeah, let's target specific nodes and ignore it for the rest. That makes it feel a lot safer.

JacobNinja commented 10 years ago

All of the keywords above have been implemented besides do for call blocks.

kytrinyx commented 10 years ago

w00t!

kytrinyx commented 10 years ago

Oh, I forgot to ask: does this remove the rules that you were using at the beginning that didn't target specific nodes?

JacobNinja commented 10 years ago

Yeah this is a complete rewrite of the analyzer, so the old logic has been removed. I'm gonna leave this issue open so that we can post new suggestions/updates here.

kytrinyx commented 10 years ago

Sweet. I'll test this against the production data and see what it triggers.

kytrinyx commented 10 years ago

I've sanity checked the inconsistent spacing rule, and it looks really good. It only got triggered about 30 times in a test of 1500 code examples, which seems right.

There is one instance where I'm not sure what triggered it. Would you take a look when you have a moment? (warning, it scrolls)

"class StrandNormalizer\n  def normalize(original, copy)\n    comparison = original.length <=> copy.length\n\n    strands = nil\n    case comparison\n    when -1\n      strands = StrandCollection.new(original, copy.slice(0, original.length))\n    when 1\n      strands = StrandCollection.new(original.slice(0, copy.length), copy)\n    else\n      strands = StrandCollection.new(original, copy)\n    end\n  end\nend\n\nclass Strand\n  attr_reader :nucleotides\n\n  def initialize(strand_text)\n    @nucleotides = []\n    @nucleotides = strand_text.chars\n  end\nend\n\nclass StrandCollection\n  attr_reader :original, :copy\n\n  def initialize(original, copy)\n    @original = Strand.new(original)\n    @copy = Strand.new(copy)\n  end\n\n  def distance\n    return 0 if original.nucleotides.length == 0 && copy.nucleotides.length == 0\n\n    calculate_distance\n  end\n\n  private\n\n  def calculate_distance\n    distance = 0\n    original.nucleotides.each_with_index do |n, i|\n      distance += 1 unless n == copy.nucleotides[i]\n    end\n\n    distance\n  end\nend\n\nclass Hamming\n  def self.compute(original, copy)\n    @normalizer = StrandNormalizer.new\n    strands = @normalizer.normalize(original, copy)\n\n    strands.distance\n  end\nend"
JacobNinja commented 10 years ago

The case node on line 6 is triggering this. The rule expects a 2 character addition for each when and else node within case. Is your expectation that when indentation can be equivalent to case indentation? If so, are there other expressions that should have the same behavior?

kytrinyx commented 10 years ago

Oooh. Yeah, of course. Hm. I think that both treatments seem valid in ruby. I can't think of any other expressions where I'd expect either style to be valid, though.

JacobNinja commented 10 years ago

Cool, fixed in 27ba0948a9e37fa0e7164ba6e7f7ba9b322b6ace

Valid case indentation:

case foo
when bar then true
when baz then true
else false
end
case foo
  when bar then true
  when baz then true
  else false
end

Invalid case:

case foo
when bar then true
  else false # inconsistent with first when
end
case foo
when bar then true
  when baz then false # inconsistent with first when
end
case foo
 end # inconsistent end
kytrinyx commented 10 years ago

This looks great! Thank you :heart: