facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.83k stars 4.71k forks source link

Fasttext yields same wrong prediction probabilities on VM with NVIDIA RAPIDS environment #1299

Open lvxhnat opened 1 year ago

lvxhnat commented 1 year ago

I am currently running fasttext on VertexAI Workbench with the following specifications: Environment : RAPIDS 0.18 (with Intel® MKL-DNN/MKL) Environment version: M65 Machine type : n1-standard-4 (4 vCPUs, 15 GB RAM) GPU : NVIDIA Tesla T4 x 1

What I realised is that running fasttext on this VM environment will always yield the same probability, screenshot below

import fasttext

language_model = fasttext.load_model("assets/models/langdetects/lid.176.bin")

language_model.predict("안녕 잘 지내")

The model is installed from https://fasttext.cc/docs/en/language-identification.html. These are the pip dependencies i have in my environment on the VM:

aiohttp==3.8.3
aiosignal==1.2.0
amqp==5.1.1
anyio==3.6.1
appdirs==1.4.4
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.0.8
async-timeout==4.0.2
asyncpg==0.26.0
attrs==22.1.0
backcall==0.2.0
backports.functools-lru-cache==1.6.4
beautifulsoup4==4.11.1
billiard==3.6.4.0
bleach==5.0.1
blis==0.7.8
bokeh==2.4.3
brotlipy==0.7.0
cachetools==5.2.0
catalogue==2.0.8
celery==5.2.7
certifi==2022.9.24
cffi==1.15.1
charset-normalizer==2.1.1
click==8.0.4
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
cligj==0.7.2
cloudpickle==2.2.0
colorama==0.4.5
colorcet==3.0.0
confection==0.0.2
confluent-kafka==1.7.0
contourpy==1.0.5
cryptography==37.0.4
cucim==22.6.0
cuda-python==11.7.0
cudf==22.6.1
cudf-kafka==22.6.1
cugraph==22.6.1+0.gde8036b5.dirty
cuml==22.6.1
cupy==9.6.0
cupy-cuda113==10.6.0
cusignal==22.6.0
cuspatial==22.6.0
custreamz==22.6.1
cuxfilter==22.6.0
cycler==0.11.0
cymem==2.0.6
cytoolz==0.12.0
dask==2022.5.2
dask-cuda==22.6.0
dask-cudf==22.6.1
datashader==0.13.1a0
datashape==0.5.4
db-dtypes==1.0.4
debugpy==1.6.3
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.13
distributed==2022.5.2
elastic-transport==8.4.0
elasticsearch==8.4.0
emoji==1.7.0
entrypoints==0.4
executing==1.1.0
fastavro==1.6.1
fastjsonschema==2.16.2
fastrlock==0.8
fasttext==0.9.2
filelock==3.8.0
Fiona==1.8.20
flit_core==3.7.1
fonttools==4.37.3
frozenlist==1.3.1
fsspec==2022.8.2
gcsfs==2022.8.2
GDAL==3.3.2
geopandas==0.9.0
google-api-core==2.8.2
google-auth==2.11.0
google-auth-oauthlib==0.5.2
google-cloud-bigquery==3.3.2
google-cloud-bigquery-storage==2.14.2
google-cloud-core==2.3.2
google-cloud-storage==2.5.0
google-crc32c==1.5.0
google-resumable-media==2.4.0
googleapis-common-protos==1.56.4
greenlet==1.1.3
grpcio==1.49.1
grpcio-status==1.49.1
HeapDict==1.0.1
holoviews==1.14.6
huggingface-hub==0.10.0
idna==3.4
imagecodecs==2021.8.26
imageio==2.22.0
importlib-metadata==4.11.4
importlib-resources==5.9.0
ipykernel==6.14.0
ipython==8.4.0
ipywidgets==8.0.2
jedi==0.18.1
Jinja2==3.1.2
joblib==1.2.0
jsonschema==4.16.0
jupyter-client==7.3.4
jupyter_core==4.11.1
jupyter-server==1.19.1
jupyter-server-proxy==3.2.2
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.3
kiwisolver==1.4.4
kombu==5.2.4
langcodes==3.3.0
llvmlite==0.39.1
locket==1.0.0
lxml==4.8.0
lz4==4.0.0
mapclassify==2.4.3
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.6.0
matplotlib-inline==0.1.6
mistune==2.0.4
modelx==0.1
more-itertools==8.14.0
msgpack==1.0.4
multi-rake==0.0.2
multidict==6.0.2
multipledispatch==0.6.0
munch==2.5.0
munkres==1.1.4
murmurhash==1.0.8
nbclient==0.6.8
nbconvert==7.0.0
nbformat==5.6.1
nest-asyncio==1.5.5
networkx==2.6.3
nltk==3.7
numba==0.56.2
numpy==1.21.6
nvtx==0.2.3
oauthlib==3.2.1
packaging==21.3
pandas==1.4.4
pandas-gbq==0.17.4
pandocfilters==1.5.0
panel==0.12.7
param==1.12.2
parso==0.8.3
partd==1.3.0
pathy==0.6.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.1.1
pip==22.2.2
pkgutil_resolve_name==1.3.10
preshed==3.0.7
prometheus-client==0.14.1
prompt-toolkit==3.0.31
proto-plus==1.22.1
protobuf==4.21.7
psutil==5.9.2
psycopg2-binary==2.9.3
ptxcompiler==0.2.0
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==7.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybind11==2.10.0
pycld2==0.41
pycparser==2.21
pyct==0.4.6
pydantic==1.9.2
pydata-google-auth==1.4.0
pydeck==0.5.0
pyee==8.1.0
Pygments==2.13.0
pylibcugraph==22.6.1+0.gde8036b5.dirty
pynvml==11.4.1
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyppeteer==1.0.2
pyproj==3.1.0
pyrsistent==0.18.1
PySocks==1.7.1
python-dateutil==2.8.2
python-dotenv==0.20.0
pytz==2022.2.1
pyviz-comms==2.2.1
PyWavelets==1.3.0
PyYAML==6.0
pyzmq==24.0.1
raft==22.6.0
redis==4.3.4
regex==2022.9.13
requests==2.28.1
requests-oauthlib==1.3.1
rmm==21.12.0
rsa==4.9
Rtree==1.0.0
scikit-image==0.19.3
scikit-learn==1.1.2
scipy==1.9.1
Send2Trash==1.8.0
sentence-transformers==2.2.2
sentencepiece==0.1.97
setuptools==60.10.0
Shapely==1.8.0
simpervisor==0.4
six==1.16.0
smart-open==5.2.1
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.3.2.post1
spacy==3.4.1
spacy-legacy==3.0.10
spacy-loggers==1.0.3
SQLAlchemy==1.4.40
srsly==2.4.4
stack-data==0.5.1
streamz==0.6.4
tblib==1.7.0
terminado==0.15.0
thinc==8.1.2
threadpoolctl==3.1.0
tifffile==2021.11.2
tinycss2==1.1.1
tokenizers==0.12.1
toolz==0.12.0
torch==1.12.1
torchvision==0.13.1
tornado==6.1
tqdm==4.64.0
traitlets==5.4.0
transformers==4.22.2
treelite==2.4.0
treelite-runtime==2.4.0
typer==0.4.2
typing_extensions==4.3.0
ucx-py==0.26.0
unicodedata2==14.0.0
urllib3==1.26.11
vine==5.0.0
wasabi==0.10.1
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.4.1
websockets==10.3
wheel==0.37.1
widgetsnbextension==4.0.3
wrapt==1.14.1
xarray==2022.6.0
xgboost==1.6.1
yarl==1.7.2
zict==2.2.0
zipp==3.8.1
Screenshot 2022-10-09 at 12 33 42 PM
keys-zlc commented 1 year ago

Have you ever solved the problem?

lvxhnat commented 1 year ago

Have you ever solved the problem?

It resolved itself, but I never figured out why, or how. My guess is it had something to do with the way the model is cached when calling load_model, since it resolved after resetting my VM.