RFC: quick and dirty parallel lint

clebergnu commented 6 years ago

The lint check is by far the lenghtiest static code check on my workflow. Let's make it run in parallel and save some precious time.

This is a quick and dirty version, just to prove the point that we can and should do it. I reckon this could and should be done at a different level, and also available for other checks (such as style, indent, license, etc).

Signed-off-by: Cleber Rosa crosa@redhat.com

The numbers I get when running lint for Avocado are (before):

$ /usr/bin/time ./selftests/checkall 

Running 'inspekt lint --exclude=.git --enable W0102,W0611'
Pylint disabled: W,R,C,E1002,E1101,E1103,E1120,F0401,I0011
Pylint enabled : W0102,W0611
Syntax check PASS
lint PASSED

All checks PASSED
60.21user 1.17system 1:01.58elapsed 99%CPU (0avgtext+0avgdata 531472maxresident)k
0inputs+2056outputs (0major+190536minor)pagefaults 0swaps

And after:

$ /usr/bin/time ./selftests/checkall 

Running 'inspekt lint --exclude=.git --enable W0102,W0611'
Pylint disabled: W,R,C,E1002,E1101,E1103,E1120,F0401,I0011
Pylint enabled : W0102,W0611
Syntax check PASS
lint PASSED

All checks PASSED
152.42user 2.74system 0:24.31elapsed 638%CPU (0avgtext+0avgdata 239108maxresident)k
0inputs+2040outputs (0major+435095minor)pagefaults 0swaps

clebergnu commented 6 years ago

Check if inspektor uses pylint parallel processing

ldoktor commented 6 years ago

Actually one more request, can you please make it configurable? Some people might want to disable this feature (I'd support something like --parallel N where default is 0 (None) and 1+ means number of pools.

ldoktor commented 6 years ago

I thought about this again and why are we not simply filtering the paths and running the linter with all files as list/iterator? It supports parallel execution and it'd avoid the multiple initialization. I did a very dirty check and it works well and if we want per-file-granularity-results we can simply forward custom reporter:

class OurTextReporter(TextReporter):
    def __init__(self, *args, **kwargs):
        super(OurTextReporter, self).__init__(*args, **kwargs)
        self.__failed_modules = []

    def handle_message(self, msg):
        if msg.module not in self._modules:
            if msg.module:
                self.__failed_modules.append(msg.module)
        return super(OurTextReporter, self).handle_message(msg)

ldoktor commented 6 years ago

.. to be precise I used this:

        def should_be_checked(path):
            checker = PathChecker(path=path, args=parsed_args, label='Lint',
                                  logger=self.log)
            if checker.check_attributes('text', 'python', 'not_empty'):
                return True
            else:
                return False
        paths = []
        not_file_or_dirs = []
        for file_or_dir in parsed_args.path:
            if os.path.isdir(file_or_dir):
                for root, dirs, files in os.walk(file_or_dir):
                    for filename in files:
                        path = os.path.join(root, filename)
                        if should_be_checked(path):
                            paths.append(path)
            elif os.path.isfile(file_or_dir):
                if should_be_checked(file_or_dir):
                    paths.append(file_or_dir)
            else:
                not_file_or_dirs.append(file_or_dir)

to walk/filter the paths and then simply run:

runner = QuietLintRun(self.get_opts() + ["-j", "8"] + paths, exit=False)

to run with all paths and in parallel. Anyway let me know if you prefer the original or this version.

avocado-framework / inspektor

RFC: quick and dirty parallel lint #32