ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.1k stars 728 forks source link

AttributeError: 'SentencePieceProcessor' object has no attribute 'encode' #1212

Closed LeonidTsyplenkov closed 2 years ago

LeonidTsyplenkov commented 3 years ago

Hi! I have code that used to work for months and suddenly it does not any more. I get the error message:

File "/root/.local/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py", line 1934, in predict eval_examples, evaluate=True, multi_label=multi_label, no_cache=True File "/root/.local/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py", line 1751, in load_and_cache_examples no_cache=no_cache, File "/root/.local/lib/python3.7/site-packages/simpletransformers/classification/classification_utils.py", line 257, in init data, tokenizer, args, mode, multi_label, output_mode, no_cache File "/root/.local/lib/python3.7/site-packages/simpletransformers/classification/classification_utils.py", line 227, in build_classification_dataset disable=args.silent, File "/root/.local/lib/python3.7/site-packages/tqdm/std.py", line 1173, in iter for obj in iterable: File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 748, in next raise value AttributeError: 'SentencePieceProcessor' object has no attribute 'encode'

Something happened in the latest version?

ThilinaRajapakse commented 3 years ago

Nothing should have changed in the classification models. Can you share a minimal script to reproduce this?

LeonidTsyplenkov commented 3 years ago

Well what I use is ClassificationModel from simpletransformers.classification. And I get this error when I call model.predict([query])[-1] where query is a prepared string.

ThilinaRajapakse commented 3 years ago

What's the model type?

LeonidTsyplenkov commented 3 years ago

I use Albert V2

ThilinaRajapakse commented 3 years ago

It's still working for me (v0.61.13). Does it work for you with previous versions?

nikolamilosevic86 commented 3 years ago

I got this same error, on Python 3.6, with following packages installed:

absl-py==0.12.0
alabaster==0.7.10
anaconda-client==1.6.14
anaconda-project==0.8.2
asn1crypto==0.24.0
astor==0.8.1
astroid==1.6.3
astropy==3.0.2
astunparse==1.6.3
attrs==19.3.0
autovizwidget==0.15.0
awscli==1.18.46
Babel==2.5.3
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
beautifulsoup4==4.6.0
bitarray==0.8.1
bkcharts==0.2
blaze==0.11.3
bleach==2.1.3
blis==0.4.1
bokeh==1.0.4
boto==2.48.0
boto3==1.12.11
botocore==1.15.46
Bottleneck==1.2.1
cachetools==4.1.0
catalogue==1.0.0
cchardet==2.1.6
certifi==2020.4.5.1
cffi==1.11.5
chardet==3.0.4
click==6.7
cloudpickle==0.5.3
clyent==1.2.2
colorama==0.3.9
conllu==1.3.1
contextlib2==0.5.5
cryptography==2.3.1
cycler==0.10.0
cymem==2.0.3
Cython==0.28.2
cytoolz==0.9.0.1
dask==0.17.5
dataclasses==0.7
datasets==1.6.0
datashape==0.5.4
decorator==4.3.0
defusedxml==0.6.0
distributed==1.21.8
docutils==0.14
editdistance==0.5.3
eli5==0.10.1
en-core-sci-lg==0.2.4
entrypoints==0.2.3
enum34==1.1.9
environment-kernels==1.1.1
et-xmlfile==1.0.1
fastcache==1.0.2
fastprogress==0.2.3
filelock==3.0.12
flaky==3.6.1
Flask==1.1.2
Flask-Cors==3.0.8
flatbuffers==1.12
fsspec==2021.4.0
ftfy==5.7
future==0.18.2
gast==0.3.3
gevent==1.5.0
glob2==0.6
gmpy2==2.0.8
google-auth==1.14.0
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
googleapis-common-protos==1.51.0
graphviz==0.13.2
greenlet==0.4.15
grpcio==1.32.0
h5py==2.10.0
hdijupyterutils==0.15.0
heapdict==1.0.0
horovod==0.19.0
html5lib==1.0.1
huggingface-hub==0.0.12
idna==2.9
imageio==2.3.0
imagesize==1.0.0
importlib-metadata==1.6.0
ipykernel==4.8.2
ipyparallel==6.2.2
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.4.0
isort==4.3.4
itsdangerous==1.1.0
jdcal==1.4
jedi==0.12.0
jieba==0.42.1
Jinja2==2.10
jmespath==0.9.4
joblib==0.14.1
jsonnet==0.15.0
jsonpickle==1.4
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
jupyterlab==0.32.1
jupyterlab-launcher==0.10.5
Keras==2.4.3
Keras-Applications==1.0.8
keras-bert==0.86.0
keras-embed-sim==0.8.0
keras-layer-normalization==0.14.0
keras-multi-head==0.27.0
keras-pos-embd==0.11.0
keras-position-wise-feed-forward==0.6.0
Keras-Preprocessing==1.1.2
keras-self-attention==0.46.0
keras-transformer==0.38.0
kiwisolver==1.0.1
ktrain==0.26.2
langdetect==1.0.8
lazy-object-proxy==1.3.1
llvmlite==0.23.1
locket==0.2.0
lxml==4.2.1
Markdown==3.2.1
MarkupSafe==1.0
matplotlib==3.1.3
mccabe==0.6.1
mistune==0.8.3
mkl-fft==1.0.15
mkl-random==1.1.0
mkl-service==2.3.0
mock==4.0.1
more-itertools==8.2.0
mpmath==1.0.0
msgpack==0.6.0
msgpack-python==0.5.6
multipledispatch==0.5.0
multiprocess==0.70.11.1
murmurhash==1.0.2
nb-conda==2.2.1
nb-conda-kernels==2.2.2
nbconvert==5.4.1
nbformat==4.4.0
networkx==2.5.1
nltk==3.5
nmslib==2.0.6
nose==1.3.7
notebook==5.5.0
numba==0.38.0
numexpr==2.7.1
numpy==1.19.5
numpydoc==0.9.2
oauthlib==3.1.0
odo==0.5.1
olefile==0.45.1
opencv-python==4.1.1.26
openpyxl==2.5.3
opt-einsum==3.3.0
packaging==20.1
pandas==1.0.3
pandocfilters==1.4.2
parsimonious==0.8.1
parso==0.2.0
partd==0.3.8
path.py==11.0.1
pathlib2==2.3.2
patsy==0.5.0
pep8==1.7.1
pexpect==4.5.0
pickleshare==0.7.4
Pillow==5.4.1
pkginfo==1.4.2
plac==0.9.6
plotly==4.5.2
pluggy==0.13.1
ply==3.11
preshed==3.0.2
promise==2.3
prompt-toolkit==1.0.15
protobuf==3.11.3
protobuf3-to-dict==0.1.5
psutil==5.4.5
psycopg2==2.7.5
ptyprocess==0.5.2
py==1.8.1
py4j==0.10.7
pyarrow==3.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybind11==2.5.0
pycodestyle==2.4.0
pycosat==0.6.3
pycparser==2.18
pycrypto==2.6.1
pycurl==7.43.0.2
pyflakes==1.6.0
pygal==2.4.0
Pygments==2.2.0
pykerberos==1.2.1
pylint==1.8.4
pyodbc==4.0.23
pyOpenSSL==18.0.0
pyparsing==2.2.0
pysbd==0.2.3
PySocks==1.6.8
pyspark==2.3.2
pytest==5.4.1
pytest-arraydiff==0.2
pytest-astropy==0.3.0
pytest-doctestplus==0.1.3
pytest-openfiles==0.3.0
pytest-remotedata==0.2.1
python-dateutil==2.7.3
pytorch-pretrained-bert==0.6.2
pytorch-transformers==1.1.0
pytz==2018.4
PyWavelets==0.5.2
PyYAML==3.12
pyzmq==17.0.0
QtAwesome==0.4.4
qtconsole==4.3.1
QtPy==1.4.1
regex==2020.4.4
requests==2.23.0
requests-kerberos==0.12.0
requests-oauthlib==1.3.0
responses==0.10.12
retrying==1.3.3
rope==0.10.7
rsa==3.4.2
ruamel-yaml==0.15.35
s3fs==0.1.5
s3transfer==0.3.3
sacremoses==0.0.41
sagemaker==1.50.17
sagemaker-pyspark==1.2.8
scikit-image==0.13.1
scikit-learn==0.23.2
scipy==1.4.1
scispacy==0.2.4
seaborn==0.8.1
Send2Trash==1.5.0
sentence-transformers==1.0.4
seqeval==0.0.19
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.15.0
smdebug-rulesconfig==0.1.2
snowballstemmer==1.2.1
sortedcollections==0.6.1
sortedcontainers==1.5.10
spacy==2.2.4
sparkmagic==0.12.5
Sphinx==1.7.4
sphinxcontrib-websupport==1.0.1
spyder==3.2.8
SQLAlchemy==1.2.11
sqlparse==0.3.1
srsly==1.0.2
statsmodels==0.9.0
sympy==1.1.1
syntok==1.2.2
tables==3.4.3
tabulate==0.8.7
TBB==0.1
tblib==1.3.2
tensorboard==2.5.0
tensorboard-data-server==0.6.0
tensorboard-plugin-wit==1.8.0
tensorboardX==2.0
tensorflow-datasets==2.1.0
tensorflow-estimator==2.4.0
tensorflow-gpu==2.4.0
tensorflow-metadata==0.21.2
tensorflow-serving-api==2.1.0
termcolor==1.1.0
terminado==0.8.1
testpath==0.3.1
thinc==7.4.0
threadpoolctl==2.1.0
tokenizers==0.9.4
toolz==0.9.0
tornado==5.0.2
tqdm==4.61.2
traitlets==4.3.2
transformers==4.0.0
typing==3.6.4
typing-extensions==3.7.4.3
unicodecsv==0.14.1
Unidecode==1.1.1
urllib3==1.23
wasabi==0.6.0
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==1.0.1
Whoosh==2.7.4
widgetsnbextension==3.4.2
word2number==1.1
wrapt==1.12.1
xlrd==1.1.0
XlsxWriter==1.0.4
xlwt==1.3.0
xxhash==2.0.2
zict==0.1.3
zipp==3.1.0
(tensorflow2_p36) ubuntu@ip-172-31-26-212:~$ pip3 freeze
absl-py==0.12.0
aiofiles==0.7.0
alabaster==0.7.10
allennlp==2.3.1
altair==4.1.0
anaconda-client==1.6.14
anaconda-navigator==1.8.4
anaconda-project==0.8.2
asn1crypto==1.3.0
astor==0.8.1
astroid==1.6.3
astropy==3.0.2
astunparse==1.6.3
attrs==19.3.0
autovizwidget==0.15.0
Babel==2.5.3
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
base58==2.1.0
beautifulsoup4==4.6.0
bitarray==0.8.1
bkcharts==0.2
blaze==0.11.3
bleach==2.1.3
blinker==1.4
blis==0.4.1
bokeh==1.0.4
boto==2.48.0
boto3==1.17.54
botocore==1.20.54
Bottleneck==1.2.1
cached-property==1.5.2
cachetools==4.1.0
catalogue==1.0.0
cchardet==2.1.6
certifi==2019.11.28
cffi==1.14.0
chardet==3.0.4
click==7.1.2
cloudpickle==0.5.3
clyent==1.2.2
colorama==0.3.9
conda==4.5.12
conda-build==3.10.5
conda-verify==2.0.0
configparser==5.0.2
conllu==1.3.1
contextlib2==0.5.5
cryptography==2.3.1
cycler==0.10.0
cymem==2.0.3
Cython==0.28.2
cytoolz==0.9.0.1
dask==0.17.5
dataclasses==0.7
datasets==1.10.2
datashape==0.5.4
decorator==4.3.0
defusedxml==0.6.0
dill==0.3.4
distributed==1.21.8
docker-pycreds==0.4.0
docutils==0.14
editdistance==0.5.3
eli5==0.10.1
en-core-sci-lg==0.2.4
entrypoints==0.2.3
environment-kernels==1.1.1
et-xmlfile==1.0.1
fastapi==0.65.1
fastcache==1.0.2
fastprogress==0.2.3
filelock==3.0.12
flaky==3.6.1
Flask==1.1.2
Flask-Cors==3.0.8
flatbuffers==1.12
fsspec==2021.7.0
ftfy==5.7
future==0.18.2
gast==0.4.0
gevent==1.5.0
gitdb==4.0.7
GitPython==3.1.15
glob2==0.6
gmpy2==2.0.8
google-auth==1.14.0
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
googleapis-common-protos==1.51.0
graphviz==0.16
greenlet==0.4.15
grpcio==1.34.1
h11==0.12.0
h5py==3.1.0
hdijupyterutils==0.15.0
heapdict==1.0.0
html5lib==1.0.1
huggingface-hub==0.0.12
idna==2.7
imageio==2.3.0
imagesize==1.0.0
importlib-metadata==1.6.0
install==1.3.4
ipykernel==5.5.5
ipyparallel==6.2.2
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.4.0
isort==4.3.4
itsdangerous==1.1.0
jdcal==1.4
jedi==0.12.0
jieba==0.42.1
Jinja2==2.11.3
jmespath==0.9.4
joblib==0.14.1
jsonnet==0.15.0
jsonpickle==1.4
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
jupyterlab==0.32.1
jupyterlab-launcher==0.10.5
Keras==2.4.3
Keras-Applications==1.0.8
keras-bert==0.86.0
keras-embed-sim==0.8.0
keras-layer-normalization==0.14.0
keras-multi-head==0.27.0
keras-nightly==2.5.0.dev2021032900
keras-pos-embd==0.11.0
keras-position-wise-feed-forward==0.6.0
Keras-Preprocessing==1.1.2
keras-self-attention==0.46.0
keras-transformer==0.38.0
kiwisolver==1.0.1
ktrain==0.26.2
langdetect==1.0.8
lazy-object-proxy==1.3.1
llvmlite==0.23.1
lmdb==1.2.1
locket==0.2.0
lxml==4.2.1
Markdown==3.2.1
MarkupSafe==1.0
matplotlib==3.3.4
mccabe==0.6.1
mistune==0.8.3
mkl-fft==1.0.15
mkl-random==1.1.0
mkl-service==2.3.0
mock==4.0.1
more-itertools==8.2.0
mpmath==1.0.0
msgpack==0.6.0
msgpack-python==0.5.6
multipledispatch==0.5.0
multiprocess==0.70.12.2
murmurhash==1.0.2
navigator-updater==0.2.0
nb-conda==2.2.1
nb-conda-kernels==2.2.2
nbconvert==5.4.1
nbformat==4.4.0
networkx==2.5.1
nltk==3.5
nose==1.3.7
notebook==5.5.0
numba==0.38.0
numexpr==2.7.1
numpy==1.19.5
numpydoc==0.9.2
oauthlib==3.1.0
odo==0.5.1
olefile==0.45.1
openpyxl==2.5.3
opt-einsum==3.3.0
overrides==3.1.0
packaging==20.9
pandas==1.0.3
pandocfilters==1.4.2
parsimonious==0.8.1
parso==0.2.0
partd==0.3.8
path.py==11.0.1
pathlib2==2.3.2
pathtools==0.1.2
patsy==0.5.0
pep8==1.7.1
pexpect==4.5.0
pickleshare==0.7.4
Pillow==8.2.0
pkginfo==1.4.2
plac==0.9.6
plotly==4.5.2
pluggy==0.13.1
ply==3.11
preshed==3.0.2
promise==2.3
prompt-toolkit==1.0.15
protobuf==3.15.8
protobuf3-to-dict==0.1.5
psutil==5.4.5
psycopg2==2.7.5
ptyprocess==0.5.2
py==1.8.1
py4j==0.10.7
pyarrow==5.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.4.0
pycosat==0.6.3
pycparser==2.19
pycrypto==2.6.1
pycurl==7.43.0.2
pydantic==1.8.2
pydeck==0.6.2
pyflakes==1.6.0
pygal==2.4.0
Pygments==2.2.0
pykerberos==1.2.1
pylint==1.8.4
pyodbc==4.0.23
pyOpenSSL==19.0.0
pyparsing==2.2.0
PySocks==1.7.1
pyspark==2.3.2
pytest==5.4.1
pytest-arraydiff==0.2
pytest-astropy==0.3.0
pytest-doctestplus==0.1.3
pytest-openfiles==0.3.0
pytest-remotedata==0.2.1
python-dateutil==2.7.3
python-multipart==0.0.5
pytorch-pretrained-bert==0.6.2
pytorch-transformers==1.1.0
pytz==2018.4
PyWavelets==0.5.2
PyYAML==3.12
pyzmq==17.0.0
QtAwesome==0.4.4
qtconsole==4.3.1
QtPy==1.4.1
regex==2020.4.4
requests==2.25.1
requests-kerberos==0.12.0
requests-oauthlib==1.3.0
responses==0.10.12
retrying==1.3.3
rope==0.10.7
rsa==4.0
ruamel-yaml==0.15.87
s3fs==0.1.5
s3transfer==0.4.0
sacremoses==0.0.41
sagemaker==1.50.17
sagemaker-pyspark==1.2.8
scikit-image==0.13.1
scikit-learn==0.23.2
scipy==1.4.1
seaborn==0.8.1
Send2Trash==1.5.0
sentence-transformers==1.0.4
sentencepiece==0.1.96
sentry-sdk==1.0.0
seqeval==0.0.19
shortuuid==1.0.1
simplegeneric==0.8.1
simpletransformers==0.61.13
singledispatch==3.4.0.3
six==1.15.0
smdebug-rulesconfig==0.1.2
smmap==4.0.0
snowballstemmer==1.2.1
sortedcollections==0.6.1
sortedcontainers==1.5.10
spacy==2.2.4
sparkmagic==0.12.5
Sphinx==1.7.4
sphinxcontrib-websupport==1.0.1
spyder==3.2.8
SQLAlchemy==1.2.11
sqlparse==0.3.1
srsly==1.0.2
starlette==0.14.2
statsmodels==0.9.0
streamlit==0.85.1
style==1.1.0
subprocess32==3.5.4
sympy==1.1.1
syntok==1.2.2
tables==3.4.3
tabulate==0.8.7
TBB==0.1
tblib==1.3.2
tensorboard==2.5.0
tensorboard-data-server==0.6.0
tensorboard-plugin-wit==1.8.0
tensorboardX==2.0
tensorflow==2.3.0
tensorflow-datasets==2.1.0
tensorflow-estimator==2.5.0
tensorflow-gpu==2.5.0
tensorflow-metadata==0.21.2
termcolor==1.1.0
terminado==0.8.1
testpath==0.3.1
thinc==7.4.0
threadpoolctl==2.1.0
tokenizers==0.10.2
toml==0.10.2
toolz==0.9.0
torch==1.7.1
torchaudio==0.8.1
torchvision==0.9.1
tornado==5.0.2
tqdm==4.61.2
traitlets==4.3.2
transformers==4.8.2
typing==3.6.4
typing-extensions==3.7.4.3
tzlocal==2.1
unicodecsv==0.14.1
Unidecode==1.1.1
update==0.0.1
urllib3==1.26.6
uvicorn==0.13.4
validators==0.18.2
wandb==0.11.0
wasabi==0.6.0
watchdog==2.1.3
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==1.0.1
Whoosh==2.7.4
widgetsnbextension==3.4.2
word2number==1.1
wrapt==1.12.1
xlrd==1.1.0
XlsxWriter==1.0.4
xlwt==1.3.0
xxhash==2.0.2
zict==0.1.3
zipp==3.1.0

And following code:

model_args = {
            "max_seq_length": 196,
            "train_batch_size": 16,
            "eval_batch_size": 64,
            "num_train_epochs": 1,
            "evaluate_during_training": True,
            "evaluate_during_training_steps": 15000,
            "evaluate_during_training_verbose": True,

            "use_multiprocessing": True,
            "fp16": True,
            "use_cuda": True,

            "save_steps": -1,
            "save_eval_checkpoints": False,
            "save_model_every_epoch": False,

            "reprocess_input_data": True,
            "overwrite_output_dir": True,

        }
from simpletransformers.t5 import T5Model
model = T5Model("t5", "t5-base", args=model_args,use_cuda=True)
model.train_model(train, eval_data=eval)
mornsun commented 3 years ago

This version works for me. sentencepiece>=0.1.90

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.