benkeen / grunt-search

Grunt plugin that searches a list of files and logs all findings in various formats.
MIT License
15 stars 8 forks source link

JSHint Style Ignores #9

Open jtfairbank opened 8 years ago

jtfairbank commented 8 years ago

Hi benkeen, great work here. I just spent the last hour adding this to a bunch of our repos, can't wait to find all the bad code with it.

The Problem

My biggest issue is that I have to do some pretty icky regexes to determine if a search match is ok or not. A great example is that in our js code, 90% of the time it is a mistake to include a console.log statement. That usually indicates someone forgot to remove some debug code, or someone got lazy and isn't properly reporting error conditions.

However there are valid use cases for console.log, such as in the error reporting system itself (one reporter is a console logger).

Current Workaround:

 options: {
    searchString: /^.*console\.((?!grunt\-search:console ignore:line).)*$/ig
  , failOnMatch: true
  , logFormat: "console"
}

Suggested Improvement

Many developers in the js community are already familiar with jshint. They have great support for ignoring all or some errors for single or multiple lines: JSHint Docs - search for ignore.

I propose that grunt-search adds a similar ignore comment system. Based on the regex above, it could look something like this:

// grunt-search[:console] ignore:line

Where the grunt-search term provides an anchor to search for, the optional :console specifies a specific grunt task to ignore (excluding it ignores the line for all tasks), and the ignore directive indicates the scope of what to ignore.

Multi-line ignores can be added fairly easily as well:

// grunt-search[:console] ignore:start // grunt-search[:console] ignore:end

I feel like this strikes the right balance between tool scope and flexibility. It doesn't require a complicated generic ignore system, but the users can still mark specific sections of code to not check, and specify specific (or all) tasks to ignore.

Implementation Notes

The place to do this would be at the start of the main loop which iterates through each line.

Specify a regex for the ignore: grunt-search[:(\S)] ignore:line (and start / end). At the start of the loop, run the regex for a match. If there is one, check the regex's group for a specific grunt command, and if the command doesn't match the one proceed with the search.

If a line ignore is found, just continue with the loop.

If a multi-line ignore is found, continue with the loop until the multi-line end ignore is found.

If a multi-line end ignore is found without a preceding multi-line start ignore, throw a warning (but probably don't fail the task).

If there are "nested" multi-line ignores, throw an error. Or implement a counter which increments on multi-line starts and decrements on multi-line ends, and only resumes searching when it reaches 0. Or just wait until the the first multi-line end is reached, which closes any / all preceding starts. As long as this case is documented clearly, we'll all be fine.

Might also want to throw warnings if there are malformed comments, such as:

Excluding specific files is problematic because nothing in the file will be checked (and most of the lines should be), and we'd have to update the file listings quite often- basically anytime a file becomes an exception to the search term.

Closing Thoughts

I'm super busy through January but can try to get a PR out in February. If you want to implement this, I'd also be happy to buy you a beer via Paypal / Venmo / Patreon / Grattipay.

Cheers, jtfairbank

jtfairbank commented 8 years ago

Oh yeah, super bonus points if you can also include an option to automatically ignore text in comments. I haven't thought about this as much, but I feel like the same techniques can be applied.

Use case: clearly the comment here should not provide a match, even if I want to check the grunt file for other console.logs.

/* Plugin: Search
 * ------------------------------------------------------
 * [Docs](https://www.npmjs.com/package/grunt-search)
 *
 * Search for text in files and log it or fail the task. 
 */
      search: {
          // search for console.log statements
          console: {
              ...
          }
      }
benkeen commented 8 years ago

Hey @jtfairbank, great ideas! Yeah I could see these being very useful - especially identifying string matches within comments. Very practical. Wonder if there's a pre-existing jshint ignore-comment parser out there?

But I think we're in the same boat: I'm going go to be extremely busy in the coming weeks and won't get time to look at this. I'd definitely welcome a PR though if and when you get time. As long as both formats are still accepted: the simple regexp pattern like now, or a jshint ignore comment [with a flag to indicate one of the other?] I'd be very much on board with this.

Great stuff.

jtfairbank commented 8 years ago

@benkeen Sorry for the wall of text. This is actually quite a fun problem to think about, and I geek out about this stuff late at night. I'll try to get to it, lets just keep each other informed on this issue if either one of us has ideas or questions / starts work / wants feedback / etc.

JSHint Code / Reuse

I looked into the jshint source- they actually implement a full lexer and parser to understand the javascript code (at a language analysis level, not compiler / runtime level) so I don't think their code would be very useful. It's all open source though, the relevant files are:

Code

Tests

Simple Regex vs. Ignore Comments

A bit confused by what you meant by:

As long as both formats are still accepted: the simple regexp pattern like now, or a jshint ignore comment [with a flag to indicate one of the other?] I'd be very much on board with this.

I think both can work at the same time. The simple regex (searchString option) decides what matches in the first place (or not). The ignore comments that add an additional layer of discretion which allow specific lines or sections of code to be marked as "ok" even if they match the regex pattern.

Additional Thoughts on the Implementation

Currently the following options are available:

  1. Add custom per-line ignore command via the searchString option (as seen in my first comment).
  2. Implement a function for the logCondition option that can include slightly more complex code and choose to ignore matches or not.

    For example, here you could limit the ignore's to a specific filetype, range of line numbers (skip the license comment on lines 1 - 10 of every file), or based on the match itself.

    Since this can be set per command, it can also be used to decide if / when to ignore certain lines (ie ignore some for "grunt-search:console", others for "grunt-search:window", and all for "grunt-search:something-weird".

Improved options include:

Additional Thoughts on Detecting Comments

Something to consider is that your plugin can work for many programming / non-programming languages and file types, whereas jshint is javascript specific.

Honestly I typed a bunch of stuff out here, but in reality it is not possible to correctly parse comments for generic languages. There are too many edge cases, variations between comment systems, etc. My recommendation is to just do a regex on each line for the ignore command string and implement the functionality based on that. I've left my notes below since they're already typed up and may be interesting.

To do this, grunt-search should support custom comment string via an option, possibly scoped to a file type:

options: {
    applyDefaultCommentTypes: true, // overridden by commentStyles
    commentStyles: [
        {
            multilineStart: "/*",
            multilineEnd: "*/",
            singleLine: "//",
            fileTypes: [".js", ".c", ...]
        },
        {
            multilineStart: "<!--",
            multilineEnd: "-->",
            fileTypes: [".html". ".handlebars"]
        },

        // NOTE: How to best handle languages that allow multiple comment styles for each category?
        //       ".handlebars" is included above for html style comments, and below for both of their own style comments.
        //       Maybe have each entry be specific to a single language, specified in the key? "'.handlebars': { ... comment styles ...}"
        {
            multilineStart: ["<!--", "{{!", "{!--"],
            multilineEnd: ["-->", "}}", "--}}"],
            fileTypes: [".handlebars"]
        },
    ]
}

It may also want to include a set of sane defaults tied to common programming languages (/* ... */, // ..., <!-- ... -->, etc) which are included by default and chosen based on the file type. The options would override this, and if a file type isn't found in the options or defaults a warning is thrown, or we ignore trying to parse for comments.

Another issue is that we would be doing 'naive' lexing / parsing by using regexes or string searches on each line instead of reading them in symbolically.

Perhaps the best way to go would be to skip checking for comments all together and just say "if the directive is included anywhere on the line, including in the active code as a string, etc, then we will apply it". Personally I feel like this is the best approach: let's keep it simple for now.

jtfairbank commented 8 years ago

Dammit... goodbye productivity tonight. ;)