erg-lang / erg

A statically typed language compatible with Python
http://erg-lang.org
Apache License 2.0
2.69k stars 55 forks source link

Strings are trimmed at tokenize #159

Closed GreasySlug closed 2 years ago

GreasySlug commented 2 years ago

Describe the bug

I am trying to fix #157, and I found this bug there

This happens when there are double quotes, line breaks, whitespace, etc. at the beginning and end of a string.

Strings are trimmed of leading and trailing whitespace, etc.

Reproducible code

>>> a = "   hi   " 

>>> a
hi

# The following two are reproducible once #157 is fixed
>>> b = "\"  hi  \""

>>> b
hi
>>> c = "\n  hi \n"

>>> c
hi # Personally, I feel that this is a different bug than this itself

Expected behavior

>>> a = "   hi   "

>>> a
'    hi    ' # For convenience, `'`s are used to express this.

Additional context

Properly compiled when surrounded by characters

>>> c = "a \" \n b" 

>>> c
a " 
 b

After a little debugging, I found the following code to be converted.

https://github.com/erg-lang/erg/blob/19428a417f0ec0ec6f26b3aaefe08fa7d9c7671a/compiler/erg_parser/token.rs#L351-L359

mtshiba commented 2 years ago

I found the real cause of this behavior!

This is not the compiler's fault, it's because repl_server.py was stripping the output.

https://github.com/erg-lang/erg/blob/d2ad7caaab260f8d6926aca7a3eb2bc86aedea7b/src/scripts/repl_server.py#L35

GreasySlug commented 2 years ago

I'm sorry about that. I'm glad it's been fixed.