Open dunnkers opened 4 years ago
I don't know if these are the reasons behind the error, but consider the following: If it's a 6-feature dataset, then: siso_ranking_size = 8 This should be less or equal to 6. max_number_of_features = 100. Same as above.
reasonable values in this case would be 1 and 6 respectively.
That seems to explain the error- the 6-feature dataset now runs normally. I didn't know that siso_ranking_size
should be <= # dataset features, maybe an assertion in the code and some docs would be nice.
What are reasonable values of siso_ranking_size
I could use in my tests? The amount of features in the datasets range from 6 to 100000, so probably using a value of 8 is fine for all other datasets. I could also use a fixed value of 5, so I could use the same value for all tests.
yes, you're right. Some assertions would be helpful. You could use 5. But for larger datasets it might be helpful to increase it a bit. Just keep in mind how this can affect your runtime. I would say a good rule of thumb is to set it to 10 for datasets with over 100 features.
Input is a 6-feature dataset, found here. FeatBoost is executed using the following setup:
(exactly the same setup as
test.py
)verbose=2
.Full error log
```shell (venv) ➜ feature-selection git:(master) ✗ env DEBUGPY_LAUNCHER_PORT=53859 /Users/dunnkers/git/feature-selection/venv/bin/python /Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/launcher /Users/dunnkers/git/feature-selection/jobs/run-featboost.py /Users/dunnkers/git/feature-selection/data/6_bit_mutliplexer Ranking pool [FeatBoost_XGBoost] Running pool... [4 workers, 1 datasets] Ranking features iteration 01 feature importances of all available feature: x_001 3.792205 x_003 3.277614 x_004 2.644713 x_002 2.451928 x_006 2.280755 x_005 2.112983 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 22, in ranking_pool ranking = ranking_func(X, y) File "/Users/dunnkers/git/feature-selection/jobs/run-featboost.py", line 25, in FeatBoost_XGBoost fs.fit(X, y) File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 188, in fit return self._fit(X, Y) File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 272, in _fit selected_variable,best_acc_t = self._siso(X,Y,iteration_number) File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 396, in _siso ranking, self.all_ranking_ = self._input_ranking(X, Y, iteration_number) File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 559, in _input_ranking print("%s %05f" % (self._feature_names[feature_rank[i]], feature_importance[feature_rank[i]])) IndexError: index -7 is out of bounds for axis 0 with size 6 """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/__main__.py", line 45, in
cli.main()
File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 267, in run_file
runpy.run_path(options.target, run_name=compat.force_str("__main__"))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/dunnkers/git/feature-selection/jobs/run-featboost.py", line 36, in
run_ranking_pool(FeatBoost_XGBoost)
File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 42, in run_ranking_pool
run_pool(ranking_pool, 'ranking', ranking_func, ranking_method)
File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 99, in run_pool
pool_results = pool.starmap(func, pool_args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
IndexError: index -7 is out of bounds for axis 0 with size 6
```
verbose=1
, another error is thrown.Full error log
```shell (venv) ➜ feature-selection git:(master) ✗ env DEBUGPY_LAUNCHER_PORT=53886 /Users/dunnkers/git/feature-selection/venv/bin/python /Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/launcher /Users/dunnkers/git/feature-selection/jobs/run-featboost.py /Users/dunnkers/git/feature-selection/data/6_bit_mutliplexer Ranking pool [FeatBoost_XGBoost] Running pool... [4 workers, 1 datasets] Ranking features iteration 01 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 22, in ranking_pool ranking = ranking_func(X, y) File "/Users/dunnkers/git/feature-selection/jobs/run-featboost.py", line 25, in FeatBoost_XGBoost fs.fit(X, y) File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 188, in fit return self._fit(X, Y) File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 272, in _fit selected_variable,best_acc_t = self._siso(X,Y,iteration_number) File "/Users/dunnkers/git/feature-selection/jobs/lib/feat_boost.py", line 397, in _siso self.siso_ranking_[(iteration_number-1), :] = ranking ValueError: could not broadcast input array from shape (6) into shape (8) """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/__main__.py", line 45, in
cli.main()
File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/Users/dunnkers/.vscode/extensions/ms-python.python-2020.4.76186/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 267, in run_file
runpy.run_path(options.target, run_name=compat.force_str("__main__"))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/dunnkers/git/feature-selection/jobs/run-featboost.py", line 36, in
run_ranking_pool(FeatBoost_XGBoost)
File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 42, in run_ranking_pool
run_pool(ranking_pool, 'ranking', ranking_func, ranking_method)
File "/Users/dunnkers/git/feature-selection/jobs/ComputePool.py", line 99, in run_pool
pool_results = pool.starmap(func, pool_args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: could not broadcast input array from shape (6) into shape (8)
```