ThreeKcal / pyspark

0 stars 0 forks source link

에러 ) spark.createDataFrame 에러 #2

Open hamsunwoo opened 1 month ago

hamsunwoo commented 1 month ago
24/10/07 12:38:57 WARN Utils: Your hostname, seon-uui-MacBookPro.local resolves to a loopback address: 127.0.0.1; using 192.168.0.131 instead (on interface en0)
24/10/07 12:38:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/10/07 12:38:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
  File "/Users/seon-u/code/pyspark/src/pyspark/main.py", line 48, in <module>
    join()
  File "/Users/seon-u/code/pyspark/src/pyspark/main.py", line 28, in join
    df = spark.createDataFrame(pdf)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/seon-u/code/pyspark/.venv/lib/python3.11/site-packages/pyspark/sql/session.py", line 1440, in createDataFrame
    return super(SparkSession, self).createDataFrame(  # type: ignore[call-overload]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/seon-u/code/pyspark/.venv/lib/python3.11/site-packages/pyspark/sql/pandas/conversion.py", line 363, in createDataFrame
    return self._create_dataframe(converted_data, schema, samplingRatio, verifySchema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/seon-u/code/pyspark/.venv/lib/python3.11/site-packages/pyspark/sql/session.py", line 1485, in _create_dataframe
    rdd, struct = self._createFromLocal(map(prepare, data), schema)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/seon-u/code/pyspark/.venv/lib/python3.11/site-packages/pyspark/sql/session.py", line 1093, in _createFromLocal
    struct = self._inferSchemaFromList(data, names=schema)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/seon-u/code/pyspark/.venv/lib/python3.11/site-packages/pyspark/sql/session.py", line 969, in _inferSchemaFromList
    raise PySparkValueError(
pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_DETERMINE_TYPE] Some of types cannot be determined after inferring.
hamsunwoo commented 1 month ago

스키마를 명시해주니 해결

from pyspark.sql.types import StructType, StringType, FloatType, TimestampType, IntegerType, StructField

# 스키마 정의
schema = StructType([
        StructField("num", IntegerType(), True),
        StructField("comments", StringType(), True),
        StructField("request_time", StringType(), True),
        StructField("request_user", StringType(), True)
    ])

spark_df = spark.createDataFrame(df, schema=schema)