TresAmigosSD / SMV

Spark Modularized View
Apache License 2.0
42 stars 22 forks source link

Unicode in module does not work #1506

Closed ninjapapa closed 5 years ago

ninjapapa commented 5 years ago

Test code:

# encoding: utf-8
...
class EmploymentByState(smv.SmvModule, smv.SmvOutput):
    """Python ETL Example: employ by state"""

    def requiresDS(self):
        return [Employment]

    def run(self, i):
        df = i[Employment]
        u = "好"
        return df.groupBy(F.col("ST")).agg(F.sum(F.col("EMP")).alias("EMP"))

Basically define a variable which has some Unicode, without even using it anywhere.

The error message is:

Traceback (most recent call last):
  File "/Users/bozhang/DSA/SMV-2.3/src/main/python/smv/smvgenericmodule.py", line 568, in sourceCodeHash
    sourceHash = _sourceHash(cls)
  File "/Users/bozhang/DSA/SMV-2.3/src/main/python/smv/smvgenericmodule.py", line 53, in _sourceHash
    return _smvhash(src_no_comm)
  File "/Users/bozhang/DSA/SMV-2.3/src/main/python/smv/smvgenericmodule.py", line 38, in _smvhash
    return binascii.crc32(text.encode())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 217: ordinal not in range(128)