RustPython / Parser

MIT License
67 stars 24 forks source link

Hundread of files that are properly parsed(or not) in opposite to Python 3.11.2 #66

Open qarmin opened 1 year ago

qarmin commented 1 year ago

d23611db65dca2a71eb58fdcdce9d637f8fef8c2

When parsing with this repo multiple files, I found that sometimes python cannot parse file but this library can and vice versa.

Command to check if python can parse file

python -m py_compile PY_FILE_TEST_427160.py

Code to check if parser can parse file

let rust_valid = parse(content.as_str(), Mode::Module,"").is_ok();

Only ~3% files from pack, can be parsed by python, but not rustpython parser

Pack 644 files - OUTPUT_FILES.zip

Example files and errors

return imprt

SyntaxError: 'return' outside function
\
 import _pl

Sorry: IndentationError: unexpected indent (79285088PY_FILE_TEST_5137609254.py, line 2)
# encoding: ut

SyntaxError: unknown encoding: ut
__p|ersion__ = '2.9.0'

SyntaxError: cannot assign to expression here. Maybe you meant '==' instead of '='?
from __future__ import un

SyntaxError: future feature un is not defined
DimitrisJim commented 1 year ago

Compilation involves further processing. Something might be valid syntactically but in later stages the compiler might reject it due to additional rules that come into play when bytecode is to be generated.

This comparison might make more sense using the rest of the implementation in RustPython whereby you can actually test compile vs compile instead of compile vs parse.

If you only care about parsing, you could re-do the test using CPython's parser.

I'll close for now, feel free to open again if I missed something or if you re-run the test with CPython's parser and you do actually find discrepancies. Thanks!

youknowone commented 1 year ago

@DimitrisJim Because we actually have parser incompatibility, the test set looks very useful.

@qarmin Rather than py_compile, using ast will make more sense to exclude compile step

youknowone commented 1 year ago

@qarmin Do you know about the license of the test set? It will be helpful if we can include it to our test suite.

qarmin commented 1 year ago

This files(or minimized part of them) I took from most popular pypi libraries, so licenses are mixed.

How to parse file in python?

Chatgpt provide this solution, but don't know if this is proper

import ast

def parse_python_file(file_path):
    with open(file_path, 'r') as file:
        source_code = file.read()

      try:
          ast.parse(source_code)
      except SyntaxError as e:
          print(f"Syntax error in {file_path}: {e}")
          raise Exception()

parse_python_file('plik.py')
qarmin commented 1 year ago

When using ast.parse(from above example) instead py_compile I got different set of invalid files OutputASTInvalid.zip

youknowone commented 1 year ago

Thanks a lot! I am sorry for late response. I am also trying to make a tool to do it easier.

youknowone commented 1 year ago

You can make a compatibility test like this: https://github.com/RustPython/Parser/pull/75/files#diff-44ab11717bbf68e3aa212e22772a18520ff7c749632213defcaecacdd1f6ecaa