MrPowers / farsante

Fake Pandas / PySpark DataFrame creator
43 stars 6 forks source link

Error Encountered When Running 'test_create_fake_parquet' Test #21

Open paulooctavio opened 5 months ago

paulooctavio commented 5 months ago

image I’m encountering an issue with the test_create_fake_parquet test. I haven't made any code changes. The test fails with the following error message: py4j.protocol.Py4JJavaError: An error occurred while calling o26.parquet. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (172.27.58.218 executor driver): org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,false)). ...

Here’s what I’ve tried so far:

  1. First, I followed the documentation on contributing guidelines, but the "4. Install project dependencies" section doesn't seem to completely reflect the transition from poetry to maturin. Following the provided instructions results in the following error being thrown: [tool.poetry] section not found in /path/to/project//farsante/pyproject.toml Then I runned maturin develop inside a virtual environment and was able to properly run the tests using pytest tests/.
  2. Googled the error message to find similar issues, and tried to change from Java 10.1 to Java 1.8, but I couldn't find a solution that worked.

Environment Info:

Scala: 2.12.18
Java: 1.8.0_402
Python: 3.10.9
PySpark: 3.5.1

Is there anything I'm missing here? Since I haven't made any code changes I believe it is something wrong with my environment.

Thanks in advance for your help!