Breaking dependencies - Githubissues

CallMarl commented 10 months ago

Hello I'am trying to run your lab into wsl but an error occure with dependencies. The full trace bellow:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/nlu/pipe/pipeline.py", line 468, in predict
    return __predict__(self, data, output_level, positions, keep_stranger_features, metadata, multithread,
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/nlu/pipe/utils/predict_helper.py", line 166, in __predict__
    pipe.fit()
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/nlu/pipe/pipeline.py", line 202, in fit
    self.vanilla_transformer_pipe = self.spark_estimator_pipe.fit(self.get_sample_spark_dataframe())
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/nlu/pipe/pipeline.py", line 101, in get_sample_spark_dataframe
    return sparknlp.start().createDataFrame(data=text_df)
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/pyspark/sql/session.py", line 673, in createDataFrame
    return super(SparkSession, self).createDataFrame(
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py", line 299, in createDataFrame
    data = self._convert_from_pandas(data, schema, timezone)
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py", line 331, in _convert_from_pandas
    for column, series in pdf.iteritems():
  File "/home/callmarl/workzone/nlp/env/lib/python3.9/site-packages/pandas/core/generic.py", line 6202, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'iteritems'

callmarl@LAPTOP-QS9M6N2F ~/workzone/nlp % python --version
Python 3.9.2
callmarl@LAPTOP-QS9M6N2F ~/workzone/nlp % pip freeze
asttokens==2.4.0
backcall==0.2.0
certifi==2023.7.22
charset-normalizer==3.2.0
click==8.1.7
colorama==0.4.6
databricks-api==0.9.0
databricks-cli==0.17.7
dataclasses==0.6
decorator==5.1.1
exceptiongroup==1.1.3
executing==1.2.0
idna==3.4
ipython==8.15.0
jedi==0.19.0
johnsnowlabs==5.0.7
matplotlib-inline==0.1.6
nlu==5.0.0
numpy==1.25.2
oauthlib==3.2.2
pandas==2.1.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
pkg_resources==0.0.0
prompt-toolkit==3.0.39
ptyprocess==0.7.0
pure-eval==0.2.2
py4j==0.10.9
pyarrow==13.0.0
pydantic==1.10.11
Pygments==2.16.1
PyJWT==2.8.0
pyspark==3.1.2
python-dateutil==2.8.2
pytz==2023.3.post1
requests==2.31.0
six==1.16.0
spark-nlp==5.0.2
spark-nlp-display==4.1
stack-data==0.6.2
svgwrite==1.4
tabulate==0.9.0
traitlets==5.9.0
typing_extensions==4.7.1
tzdata==2023.3
urllib3==1.26.16
wcwidth==0.2.6

C-K-Loan commented 10 months ago

Hi @CallMarl if you downgrade Pandas below 2.0 you will get around that issue. like pip install pandas==1.5.3 we are working on a fix for that

C-K-Loan commented 9 months ago

fixed in nlu 502 https://github.com/JohnSnowLabs/nlu/pull/206

JohnSnowLabs / nlu

Breaking dependencies #198