aserg-ufmg / RefDiff

A tool to mine refactorings in the commit history of git repositories.
MIT License
148 stars 44 forks source link

New parser for python #10

Open Jirigesi opened 4 years ago

Jirigesi commented 4 years ago

Hello, Thanks for providing such a great tool. However, I want to use a similar tool on python code. I tried my best and did not find any. Is it possible that you can give me some guide to let me write a parser for python code?

Best

Symbolk commented 3 years ago

The README says that soon a detailed tutorial will be provided, looking forward to it!

ldesi commented 3 years ago

The README says that soon a detailed tutorial will be provided, looking forward to it!

Any news on that? Thanks.

rodrigo-brito commented 3 years ago

Hi @ldesi and @Symbolk. I create a parser for Go. Maybe, it can be used to create a generic parser. The Go parser converts a file to a JSON input, and this output is used to create the RefDiff CST.

I think it may be used to python:

[
    {
        "type": "File",
        "start": 0,
        "end": 203,
        "line": 1,
        "has_body": true,
        "name": "types.go",
        "namespace": "",
        "parent": null,
        "tokens": [
            "0-7",
            "8-16",
            "16-17",
            "18-22",
            "23-31",
            "32-35",
            "35-36",
            "36-40",
            "41-49",
            "50-54",
            "55-61",
            "61-62",
            "62-63",
            "63-64",
            "65-69",
            "70-71",
            "73-81",
            "85-86",
            "86-87",
            "87-90",
            "90-91",
            "92-103",
            "104-105",
            "105-106",
            "106-112",
            "112-113",
            "114-115",
            "126-132",
            "132-133",
            "133-134",
            "134-135",
            "136-138",
            "148-157",
            "158-159",
            "162-163",
            "163-164",
            "164-165",
            "166-169",
            "169-170",
            "171-172",
            "172-173",
            "173-174",
            "174-175",
            "176-180",
            "181-182",
            "182-190",
            "190-191",
            "192-196",
            "196-197",
            "197-198",
            "199-200",
            "202-203",
            "203-204"
        ],
        "receiver": null
    },
    {
        "type": "Type",
        "start": 23,
        "end": 35,
        "line": 3,
        "has_body": false,
        "name": "IntAlias",
        "namespace": "",
        "parent": "types.go",
        "receiver": null
    },
    {
        "type": "Type",
        "start": 41,
        "end": 63,
        "line": 4,
        "has_body": false,
        "name": "ChanType",
        "namespace": "",
        "parent": "types.go",
        "receiver": null
    },
    {
        "type": "Type",
        "start": 73,
        "end": 90,
        "line": 7,
        "has_body": false,
        "name": "IntSlice",
        "namespace": "",
        "parent": "types.go",
        "receiver": null
    },
    {
        "type": "Type",
        "start": 92,
        "end": 112,
        "line": 8,
        "has_body": false,
        "name": "StringSlice",
        "namespace": "",
        "parent": "types.go",
        "receiver": null
    },
    {
        "type": "Struct",
        "start": 126,
        "end": 134,
        "line": 9,
        "has_body": true,
        "name": "A",
        "namespace": "",
        "parent": "types.go",
        "receiver": null
    },
    {
        "type": "Interface",
        "start": 148,
        "end": 172,
        "line": 10,
        "has_body": false,
        "name": "iA",
        "namespace": "",
        "parent": "types.go",
        "receiver": null
    },
    {
        "type": "Function",
        "start": 162,
        "end": 169,
        "line": 11,
        "has_body": false,
        "name": "A",
        "namespace": "iA.",
        "parent": "iA",
        "receiver": null
    },
    {
        "type": "Function",
        "start": 176,
        "end": 203,
        "line": 15,
        "has_body": true,
        "name": "Test",
        "namespace": "IntSlice.",
        "parent": "IntSlice",
        "receiver": "IntSlice"
    }
]

Mosallamy commented 3 years ago

Hi @rodrigo-brito, we are working on a graduation project and in one part of the project we need to use a refactoring tool such as RefDiff. The problem is that we need it for Python. So I wanted to ask you, how hard is it to create a RefDiff plugin for Python such as the one you created for Go?

rodrigo-brito commented 3 years ago

Hi @Mosallamy, I spent one month creating the plugin. This week, I will try to create a short tutorial to help the other developers in plugin creation. But the main effort is to create an AST parser to extract the main components of a python file. For example, for the given file (example.py located in my_package):

def foo(x):
    print("x = ", x)

def bar():
    foo(10)

You should return a structure like this:

[
  {
    "type": "File",
    "start": 0,
    "end": 50,
    "line": 1,
    "has_body": true,
    "name": "example.py",
    "namespace": "my_package",
    "parent": null,
    "tokens": [
      "0-4",
      "5-8",
      ...
    ],
  },
  {
    "type": "Function",
    "start": 23,
    "end": 35,
    "line": 1,
    "has_body": true,
    "name": "foo",
    "namespace": "my_package",
    "parent": "example.py",
    "parameters": ["x"],
    "calls": []
  },
  {
    "type": "Function",
    "start": 36,
    "end": 50,
    "line": 5,
    "has_body": true,
    "name": "bar",
    "namespace": "my_package",
    "parent": "example.py",
    "parameters": [],
    "calls": ["my_package.foo"]
  },
]

The start and end values are just an example, it is not the correct position. But in summary:

If we have this information, we can create the plugin. Do you have experience with python AST?

Mosallamy commented 3 years ago

@rodrigo-brito Thanks for the fast reply! We have experimented a little with the built in Python AST library.

https://docs.python.org/3/library/ast.html

Form the AST library we can extract the following information:

As for the Tokens, we've found the following library https://asttokens.readthedocs.io/en/latest/user-guide.html, which returns the positions of tokens

rodrigo-brito commented 3 years ago

@Mosallamy, I can help you with the code. Can you open a new repository for it? We can use Jython to create the parser and integrate it directly in Java module.

Mosallamy commented 3 years ago

Hey @rodrigo-brito, I just created a repo with a script that parses a python file and extract the following information from any function:

Run the Ast.py file to get the output

rodrigo-brito commented 3 years ago

Hi @Mosallamy, can you share the repository link?

Mosallamy commented 3 years ago

https://github.com/Mosallamy/refdiff-python

Mosallamy commented 3 years ago

Hello @rodrigo-brito, until now we have extracted all of the information out of the AST except for the function calls. Also we have thoroughly read the RefDiff paper and understood the steps required to create a plugin, but we had a problem understanding the exact implementation of the code 😅

rodrigo-brito commented 3 years ago

Hi @Mosallamy, I will try to create the base of the plugin today. I will open a pull request in your repository soon.