Bear proposal: PyT - Githubissues

jayvdb commented 7 years ago

https://github.com/python-security/pyt Python Taint

ankitxjoshi commented 6 years ago

Sounds interesting tool, would like to create Bear for it. Please assign it to me @jayvdb

ankitxjoshi commented 6 years ago

I understand how to create linter bears, but unable to figure out what type of format would be best for displaying output generated from PyT. The output is as shown below:

What should all information be extracted from the following output? Since the named groups provided by linter class are limited.

jayvdb commented 6 years ago

The @linter decorator is probably not the best way to approach this type of output.

Each vulnerability should be a Result. At best you can extract the filename and lines of the vulnerability and store them in the Result.

ankitxjoshi commented 6 years ago

The pyt primarily gives three important information:

The line where user input is taken (line no: 6 for the above example).
Lines where the input variable is modified (line no: 8 and line no: 12)
Finally, the line where critical operation is being performed using that variable (line no: 11)

So, the way I am thinking to approach this task is:

Execute the file using pyt with the help of @linter decorator
Override the process_output and yield every vulnerability as a Result by extracting them using a regex

The Result will be printed as follows:

for vulnerability in vulnerabilities:
    yield Result.from_values(origin='PyTBear',
                             message='Following lines may create vulnerability',
                             file=filename, // The name of the file (seems redundent here)
                             line=first_lineno, // Line where user input is taken (line no: 6)
                             end_line=last_lineno, //Line where the critical operation is performed (line no: 11))

This will output the complete section from line 6-11 where the vulnerability exists. However won't tell the details that the first line(6) is the cause of the vulnerability and the last line(11) is the point where exploitation can take place.

So, should I go on with this approach? And I didn't understood why @linter would be bad way to approach. Since this doesn't seems to be a case of Native Bear.

jayvdb commented 6 years ago

that sounds good so far. @linter is ok, but not for its limited linter output regexes. Providing your own process_output is the right approach.

One nasty possible problem is that a vulnerability may actually include multiple filenames. I dont know if pyt detects such problems, but I am concerned because it is listing the filename multiple times in the same report. I suggest you look at a few of the examples to see if any pyt examples have a vulnerability that crosses multiple source files. That would mean a bit more design is needed, and implementation will be a bit more difficult.

ankitxjoshi commented 6 years ago

@jayvdb yes there is an option for specifying project root when scanning happens in multiple files. Thus, every output is accompanied by the filename. Could you please guide me how to approach it? Any previous bear that works in the same way? Or does it requires some changes to be done in coala's corelib ? I don't think the global bear would work here. Since it doesn't scans complete directory.

This is how it works globally: whatsapp image 2018-02-20 at 2 42 38 pm

The -pr option had to be specified to consider the complete project directory for analysis.

I beleive this cannot be implemented with the current design of Result

coala / coala-bears

Bear proposal: PyT #1637