Closed teresa-m closed 1 year ago
The following calls resulted in what estimator:
where | dataset | n_jobs | MB per thread | time | estimator | test score | val score |
---|---|---|---|---|---|---|---|
denbi | PARIS_human | -1 | 4300 | 43200 | KNeighborsClassifier(n_neighbors=2)} | ? | ? |
denbi | PARIS_human | 7 | 8000 | 43200 | KNeighborsClassifier(n_neighbors=1) | 0.9769516007852452 | 0.931 |
Michi_PC | PARIS_mouse | -1 | 2000 | 43200 | KNeighborsClassifier(n_neighbors=1, p=1, weights='distance')} | 0.9806713376035464 | 0.941 |
Michi_PC | PARIS_human_RBP | 4 | 8000 | 50000 | KNeighborsClassifier(n_neighbors=1) | 0.9765021819402484 | 0.929 |
Michi_PC | Old_PARIS_human | 4 | 8000 | 50000 | KNeighborsClassifier(n_neighbors=2, p=1, weights='distance') | 0.9728710530759274 | 0.918 |
Stefan | PARIS_human | 7 | 8000 | 43200 |
I still get the 'init_dgesdd failed init' error within calling cherri
posible explanaintion Mybe for some estimatores the script fails an 8000 MB are not enagh. I changed the script so it will not crash if a error ist reported within the optimization.
Col:mod/Row:data | PARIS_human | PARIS_human_RBPs | PARIS_mouse | SPLASH |
---|---|---|---|---|
estimator | HistGradientBoostingClassifier | n_neighbors': 1 | n_neighbors': 1 | ExtraTreesClassifier |
PARIS_human | 0.839 | 0.84 | 0.52 | 0.63 |
PARIS_human_RBPs | 0.93 | 0.856 | 0.54 | 0.61 |
PARIS_mouse | 0.63 | 0.55 | 0.856 | 0.63 |
SPLASH | 0.47 | 0.46 | 0.48 | 0.792 |
-> test old data with Eden features and my evaluation pipeline: PARIS_human$PARIS_human_rbps -> F1 score = 0.546 PARIS_human_rbps$PARIS_human -> F1 score = 0.504
Stefan original model Runs:
denbiubuntu~/r/biofilm$[84] shellpy showmodels.spy CHERRY/optimized (biofilm72) [65/1757] ################################################################################ ################################################################################ CHERRY/optimized/full_.model {'loss': 'auto', 'learning_rate': 0.27825205658418045, 'max_iter': 512, 'min_samples_leaf': 125, 'max_depth': None, 'max_leaf_nodes': 234, 'max_bins': 255, 'l2_regularization': 1.5724450377145835e-10, 'early_st op': 'train', 'tol': 1e-07, 'scoring': 'loss', 'n_iter_no_change': 11, 'validation_fraction': None, 'random_state': 1, 'verbose': 0, 'estimator': HistGradientBoostingClassifier(early_stopping=True, l2_regularization=1.5724450377145835e-10, learning_rate=0.27825205658418045, max_iter=512, max_leaf_nodes=234, min_samples_leaf=125, n_iter_no_change=11, random_state=1, validation_fraction=None, warm_start=True), 'fullyfit': True, 'validationfraction': None, 'earlystopping': True} ################################################################################ ################################################################################ CHERRY/optimized/fullhuman.model {'loss': 'auto', 'learning_rate': 0.18214912973602268, 'max_iter': 512, 'min_samples_leaf': 70, 'max_depth': None, 'max_leaf_nodes': 287, 'max_bins': 255, 'l2_regularization': 2.444124445454329e-05, 'early_stop ': 'train', 'tol': 1e-07, 'scoring': 'loss', 'n_iter_no_change': 18, 'validation_fraction': None, 'random_state': 1, 'verbose': 0, 'estimator': HistGradientBoostingClassifier(early_stopping=True, l2_regularization=2.444124445454329e-05, learning_rate=0.18214912973602268, max_iter=512, max_leaf_nodes=287, min_samples_leaf=70, n_iter_no_change=18, random_state=1, validation_fraction=None, warm_start=True), 'fullyfit': True, 'validationfraction': None, 'earlystopping': True} ################################################################################ ################################################################################ CHERRY/optimized/paris_humanRBPs.model {'loss': 'auto', 'learning_rate': 0.10694317220519729, 'max_iter': 512, 'min_samples_leaf': 42, 'max_depth': None, 'max_leaf_nodes': 1474, 'max_bins': 255, 'l2_regularization': 0.006245650100708052, 'early_stop ': 'off', 'tol': 1e-07, 'scoring': 'loss', 'n_iter_no_change': 0, 'validation_fraction': None, 'random_state': 1, 'verbose': 0, 'estimator': HistGradientBoostingClassifier(early_stopping=False, l2_regularization=0.006245650100708052, learning_rate=0.10694317220519729, max_iter=512, max_leaf_nodes=1474, min_samples_leaf=42, n_iter_no_change=0, random_state=1, validation_fraction=None, warm_start=True), 'fullyfit': True, 'validationfraction': None, 'earlystopping': False} ################################################################################ ################################################################################ CHERRY/optimized/paris_humanRRI.model {'loss': 'auto', 'learning_rate': 0.03860685097120409, 'max_iter': 512, 'min_samples_leaf': 44, 'max_depth': None, 'max_leaf_nodes': 1032, 'max_bins': 255, 'l2_regularization': 3.70177817485565e-10, 'early_stop ': 'train', 'tol': 1e-07, 'scoring': 'loss', 'n_iter_no_change': 12, 'validation_fraction': None, 'random_state': 1, 'verbose': 0, 'estimator': HistGradientBoostingClassifier(early_stopping=True, l2_regularization=3.70177817485565e-10, learning_rate=0.03860685097120409, max_iter=512, max_leaf_nodes=1032, min_samples_leaf=44, n_iter_no_change=12, random_state=1, validation_fraction=None, warm_start=True), 'fullyfit': True, 'validationfraction': None, 'earlystopping': True} ################################################################################ ################################################################################ CHERRY/optimized/paris_mouseRRI.model {'loss': 'auto', 'learning_rate': 0.18433753680428502, 'max_iter': 512, 'min_samples_leaf': 2, 'max_depth': None, 'max_leaf_nodes': 50, 'max_bins': 255, 'l2_regularization': 0.00040891478141833804, 'early_stop' : 'off', 'tol': 1e-07, 'scoring': 'loss', 'n_iter_no_change': 0, 'validation_fraction': None, 'random_state': 1, 'verbose': 0, 'estimator': HistGradientBoostingClassifier(early_stopping=False, l2_regularization=0.00040891478141833804, learning_rate=0.18433753680428502, max_iter=512, max_leaf_nodes=50, min_samples_leaf=2, n_iter_no_change=0, random_state=1, validation_fraction=None, warm_start=True), 'fullyfit': True, 'validationfraction': None, 'earlystopping': False} ################################################################################ ################################################################################ CHERRY/optimized/paris_splash_humanRRI.model {'loss': 'auto', 'learning_rate': 0.09011344096058371, 'max_iter': 512, 'min_samples_leaf': 21, 'max_depth': None, 'max_leaf_nodes': 481, 'max_bins': 255, 'l2_regularization': 6.439073376265142e-06, 'early_stop' : 'off', 'tol': 1e-07, 'scoring': 'loss', 'n_iter_no_change': 0, 'validation_fraction': None, 'random_state': 1, 'verbose': 0, 'estimator': HistGradientBoostingClassifier(early_stopping=False, l2_regularization=6.439073376265142e-06, learning_rate=0.09011344096058371, max_iter=512, max_leaf_nodes=481, min_samples_leaf=21, n_iter_no_change=0, random_state=1, validation_fraction=None, warm_start=True), 'fullyfit': True, 'validationfraction': None, 'earlystopping': False} ################################################################################ ################################################################################ CHERRY/optimized/splash_humanRRI.model {'n_neighbors': 2, 'weights': 'distance', 'p': 1, 'random_state': 1, 'estimator': KNeighborsClassifier(n_neighbors=2, p=1, weights='distance')}
$$updated results $$
Col:mod/Row:data | PARIS_human | PARIS_human_RBPs | PARIS_mouse |
---|---|---|---|
estimator | KNeighborsClassifier(n_neighbors=1 | HistGradientBoostingClassifier | KNeighborsClassifier(n_neighbors=4 |
PARIS_human | 0.843 | 0.863 | 0.538 |
PARIS_human_RBPs | 0.906 | 0.847 | 0.538 |
PARIS_mouse | 0.563 | 0.624 | 0.819 |
Stefan Results: human model -> mouse data F1: 0.700
mouse model -> human data F1: 0.623
Hier wurden die trusted RRIs neu berechnet. Sieht nicht sehr unterschiedlich von dem 'alten' f1 scores aus!
$$ udpdate values
my F1 scores from Cherri eval | Col:mod/Row:data | PARIS_human | PARIS_human_RBPs | PARIS_mouse |
---|---|---|---|---|
estimator | KNeighborsClassifier(n_neighbors=2 | HistGradientBoostingClassifier | KNeighborsClassifier(n_neighbors=1, | |
PARIS_human | 0.917 | 0.882 | 0.475 | |
PARIS_human_RBPs | 0.936 | 0.902 | 0.470 | |
PARIS_mouse | 0.575 | 0.662 | 0.931 |
Test : init_dgesdd failed init (on mouse data) It apperes calling Cherri on my and Stefan Denbi cloude. But also just calling biofilm for optimization: call
nohup python -W ignore -m biofilm.biofilm-optimize6 --infile //home/uhlm/Dokumente/Teresa/test_Cherri_old_data//PARIS_mouse/feature_files//training_data_PARIS_mouse_context_150 --featurefile //home/uhlm/Dokumente/Teresa/test_Cherri_old_data//PARIS_mouse//model//features/PARIS_mouse_context_150 --memoryMBthread 10000 --folds 0 --out //home/uhlm/Dokumente/Teresa/test_Cherri_old_data//PARIS_mouse//model//optimized/PARIS_mouse_context_150 --preprocess True --n_jobs 6 --time 50000 > test_only_mouse_model &
output:
adding .npz to filename
optimization datatype: <class 'numpy.ndarray'>
[WARNING] [2022-04-01 18:18:12,557:Client-AutoML(1):520fb4fc-b1d7-11ec-b0e5-901b0eb924fa] Capping the per_run_time_limit to 24999.0 to have time for a least 2 models in each process.
init_dgesdd failed init
init_dgesdd failed init
Traceback (most recent call last):
File "/home/uhlm/Progs/anaconda3/envs/cherri/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/uhlm/Progs/anaconda3/envs/cherri/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/uhlm/Progs/anaconda3/envs/cherri/lib/python3.8/site-packages/biofilm/biofilm-optimize6.py", line 82, in <module>
main()
File "/home/uhlm/Progs/anaconda3/envs/cherri/lib/python3.8/site-packages/biofilm/biofilm-optimize6.py", line 78, in main
print('\n',pipeline.steps[2][1].choice.preprocessor.get_support())
AttributeError: 'FastICA' object has no attribute 'get_support'
adding .npz to filename
########## CSV WRITTEN ##########
TEST score=0.9394774845739793
("{'n_neighbors': 1, 'weights': 'distance', 'p': 1, 'random_state': 1, "
"'estimator': KNeighborsClassifier(n_neighbors=1, p=1, weights='distance')}")
########## MODEL WRITTEN ##########
Result fixed Pythonhashseed using Eden features:
my F1 scores from Cherri eval
Col:mod/Row:data PARIS_human PARIS_human_RBPs PARIS_mouse estimator KNeighborsClassifier(n_neighbors=2 HistGradientBoostingClassifier KNeighborsClassifier(n_neighbors=1, PARIS_human 0.917 0.949
PARIS_human_RBPs 0.907 0.902
PARIS_mouse 0.931
These numbers look closer to the numbers in the supplementary file.
Hopefully, it is now working! I am currently evaluating the mouse cross model data. Hope this will be also similar. If this is the case, we maid have to switch back to the old way of data generation by just taking RRIs which had a detrected hybrid within Chira.
After fixing several buts the model building is working now.
Testing cherries model build did result in bad cross model performance. With the old runs, we got most of the time tree-based estimators. What we changed so far to improve the performance
This resulted in KNeigborsClassifier (Cross model performance still in testing)
Summary of possible error sources