ComputationalReflection / PythonSourceCodeAnalysis

Python Source Code Analysis
MIT License
0 stars 0 forks source link

PythonSourceCodeAnalysis

Python Source Code Analysis is a program designed to extract syntactic information of Python programs.

This information is stored in a relational database and can be used to analyze the information with data mining algorithms.

Purpose

This program convert graph-like information obtained with the module ast from the Python Standard Library (PSL) into n-dimensional vectors sotres in a relational database. This process creates a dataset with 16 homogeneous tables. This convertion allow classic data mining algorithms work with the dataset. This data mining algorithms can obtain information such as: most and least used syntactic elements, outlier syntactic patterns and association rules.

In addition to the syntactic information, the program allow the Python files used as argument to be flagged as Expert or Beginner. With this expertice level information linked to data mining results we can clasify new programs into Expert or Beginner programs attending to the presence or not of the different syntactic patterns identified as Expert patterns or Beginner patterns.

This type of information is high value to improve Python programming. We can use it to improve how Python is taught or to improve the tools offered by the different IDEs.

Dataset generation

The dataset used to the outliers analysis contains more than 13 million database entities. This 13 million entities comes from:

Notebooks

The outliers analysis of the previus mentioned dataset is collected in the notebooks directory. Each notebook collect the information of the syntactic construction of the name. For each syntactic construction there are two aditional files, one for the beginners and one for the experts. In each notebook there is a complete analysis of each attribute of the table. The information is displayed with graphs.

Example

As an example, we will supose that there is a directory named "python_projects". Inside this directory must be a structure a subdirectories with Python files.

The program can recieve up to 3 arguments, with of them optional:

image

In this first call, we are processing the ./python_projects directory, flagged as EXPERT programs and following the default program detection system.

image

In this call, we are processing the subdirectorie ./python_projects/program_1, flagged as EXPERT program and ignoring the default program detection system.