boostcampaitech3 / level2-mrc-level2-nlp-04

level2-mrc-level2-nlp-04 created by GitHub Classroom
1 stars 3 forks source link

arrow 파일에서 불러온 dataframe을 csv로 저장하면 발생하는 문제 #9

Closed junseok0408 closed 2 years ago

junseok0408 commented 2 years ago

데이터셋을 arrow파일에서 불러온 뒤 dataframe으로 변형시켰습니다.

그 뒤에 to_csv를 이용하여 저장하려고 했으나 아래와 같은 오류가 발생했습니다..

구글링해도 안나와서 혹시 제 라이브러리의 버전 문제인지, 아니면 모두에 해당되는 공통적인 문제인지 모르겠습니다.

다들 MRC_EDA에서 첫번째 셀 실행시킨 뒤 train_df를 to_csv로 저장해보신 후 오류 유무 알려주시면 감사하겠습니다~~ㅠㅠ

ImportError                               Traceback (most recent call last)
Input In [217], in <cell line: 1>()
----> 1 train_df.to_csv("./opt/ml/input/code/train_df.csv")

File /opt/conda/lib/python3.8/site-packages/pandas/core/generic.py:3551, in NDFrame.to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal, errors, storage_options)
   3540 df = self if isinstance(self, ABCDataFrame) else self.to_frame()
   3542 formatter = DataFrameFormatter(
   3543     frame=df,
   3544     header=header,
   (...)
   3548     decimal=decimal,
   3549 )
-> 3551 return DataFrameRenderer(formatter).to_csv(
   3552     path_or_buf,
   3553     line_terminator=line_terminator,
   3554     sep=sep,
   3555     encoding=encoding,
   3556     errors=errors,
   3557     compression=compression,
   3558     quoting=quoting,
   3559     columns=columns,
   3560     index_label=index_label,
   3561     mode=mode,
   3562     chunksize=chunksize,
   3563     quotechar=quotechar,
   3564     date_format=date_format,
   3565     doublequote=doublequote,
   3566     escapechar=escapechar,
   3567     storage_options=storage_options,
   3568 )

File /opt/conda/lib/python3.8/site-packages/pandas/io/formats/format.py:1153, in to_csv(self, path_or_buf, encoding, sep, columns, index_label, mode, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, errors, storage_options)
   1151     fmt_klass = FloatArrayFormatter
   1152 elif is_integer_dtype(values.dtype):
-> 1153     fmt_klass = IntArrayFormatter
   1154 else:
   1155     fmt_klass = GenericArrayFormatter

File /opt/conda/lib/python3.8/site-packages/pandas/io/formats/csvs.py:15, in <module>
     12 import numpy as np
     14 from pandas._libs import writers as libwriters
---> 15 from pandas._typing import FilePathOrBuffer
     17 from pandas.core.dtypes.generic import (
     18     ABCDatetimeIndex,
     19     ABCIndexClass,
     20     ABCMultiIndex,
     21     ABCPeriodIndex,
     22 )
     23 from pandas.core.dtypes.missing import notna

ImportError: cannot import name 'FilePathOrBuffer' from 'pandas._typing' (/opt/conda/lib/python3.8/site-packages/pandas/_typing.py)
hyoeun98 commented 2 years ago

첫번째 셀 실행 후 train_df.to_csv("test.csv") 하니 정상적으로 csv 저장됩니다 image

junseok0408 commented 2 years ago

확인 감사합니다~~ 오늘까지 디버깅 해보고 안되면 csv파일 부탁드리겠습니다...ㅜㅜ @hyoeun98