Closed paulomann closed 1 year ago
I also tried running locally, although with a different version and environment
Octis: 1.10.2 Python: 3.7.3 OS: Linux
And I got this full traceback, and by inspection I got a value of -inf for f_val
Traceback (most recent call last):
File "/home/paulomann/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/paulomann/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
cli.main()
File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
pkg_name=pkg_name, script_name=fname)
File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/home/paulomann/workspace/reddit-topic-modelling/octis_training/training_and_optimization.py", line 102, in <module>
model_runs=5, plot_best_seen=True) # number of runs of the topic model
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/octis/optimization/optimizer.py", line 160, in optimize
results = self._optimization_loop(opt)
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/octis/optimization/optimizer.py", line 288, in _optimization_loop
res = opt.tell(next_x, f_val)
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/skopt/optimizer/optimizer.py", line 493, in tell
return self._tell(x, y, fit=fit)
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/skopt/optimizer/optimizer.py", line 536, in _tell
est.fit(self.space.transform(self.Xi), self.yi)
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_forest.py", line 304, in fit
accept_sparse="csc", dtype=DTYPE)
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/base.py", line 432, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 805, in check_X_y
ensure_2d=False, dtype=None)
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 645, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 99, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
It was related to a topic that was absent in the dataset --- due to some bug, I had a vocabulary with words that were not in the primary dataset.
Description
I am trying to run the Google Colab example provided in the repo README. I only changed the dataset, to load a custom dataset using the
load_custom_dataset_from_folder()
in the .tsv format. I executed the algorithm with a small vocab (39 words) without problems, but with a "big" vocabulary (7894 words), I got an error from sklearn.utils.validation.py as follows:Also, note that my dataset is split into train (70%), val (10%) and test (20%).
What I Did