Open CBrauer opened 2 years ago
Do you see the same issue with non-NN TPOT? E.g., if you omit config_dict='TPOT NN'
?
OK, Is this what you wanted?
def Main(g, p):
X_train, y_train, X_test, y_test = LoadData()
# clf = TPOTClassifier(config_dict='TPOT NN',
clf = TPOTClassifier(template='Selector-Transformer-PytorchLRClassifier',
verbosity=2,
generations=g,
population_size=p,
random_state=7)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
clf.export('tpot_nn_demo_pipeline.py')
Now I get:
H:\HedgeTools\ML_Model_Generation\TPOT-NN>python tpot-NN-rocket-classify.py
Operating system version.... Windows-10-10.0.22000-SP0
Python version is........... 3.8.13
pandas version is........... 1.4.2
numpy version is............ 1.21.5
tpot version is............. 0.11.7
BoxRatio Thrust Acceleration Velocity OnBalRun vwapGain Expect Trin
count 60000.000000 60000.000000 60000.000000 60000.000000 60000.000000 60000.000000 60000.000000 60000.000000
mean 2.061707 1.677448 1.935544 0.635225 2.412940 0.984372 -3.026383 0.834455
std 4.491026 3.056146 1.956287 0.658155 1.602910 0.932878 10.023122 0.284409
min 0.034120 0.000383 0.000112 0.000839 0.048550 0.100003 -50.341116 0.280000
25% 0.344533 0.228764 0.566531 0.155102 1.463102 0.379476 -6.661925 0.600000
50% 0.693704 0.713193 1.606062 0.460673 2.086361 0.730599 -2.334339 0.800000
75% 1.619198 1.790019 2.705824 0.903497 2.905308 1.275189 1.273494 1.040000
max 74.699990 40.539430 27.995832 7.809622 22.693728 11.762206 51.561442 4.540000
Size of dataset:
train shape... (48000, 8) (48000,)
test shape.... (12000, 8) (12000,)
Traceback (most recent call last):
File "C:\anaconda3\lib\site-packages\tpot\base.py", line 496, in _add_operators
operator = next(
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tpot-NN-rocket-classify.py", line 84, in <module>
Main(10, 10)
File "tpot-NN-rocket-classify.py", line 70, in Main
clf.fit(X_train, y_train)
File "C:\anaconda3\lib\site-packages\tpot\base.py", line 725, in fit
self._fit_init()
File "C:\anaconda3\lib\site-packages\tpot\base.py", line 618, in _fit_init
self._setup_pset()
File "C:\anaconda3\lib\site-packages\tpot\base.py", line 437, in _setup_pset
self._add_operators()
File "C:\anaconda3\lib\site-packages\tpot\base.py", line 500, in _add_operators
raise ValueError(
ValueError: An error occured while attempting to read the specified template. Please check a step named PytorchLRClassifier
H:\HedgeTools\ML_Model_Generation\TPOT-NN>pause
Press any key to continue . . .
I suppose you meant to delete the first two lines.
If I run:
def Main(g, p):
X_train, y_train, X_test, y_test = LoadData()
# clf = TPOTClassifier(config_dict='TPOT NN',
# template='Selector-Transformer-PytorchLRClassifier',
clf = TPOTClassifier(verbosity=2,
generations=g,
population_size=p,
random_state=7)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
clf.export('tpot_nn_demo_pipeline.py')
I get the following results:
H:\HedgeTools\ML_Model_Generation\TPOT-NN>python tpot-NN-rocket-classify.py
Operating system version.... Windows-10-10.0.22000-SP0
Python version is........... 3.8.13
pandas version is........... 1.4.2
numpy version is............ 1.21.5
tpot version is............. 0.11.7
BoxRatio Thrust Acceleration Velocity OnBalRun vwapGain Expect Trin
count 60000.000000 60000.000000 60000.000000 60000.000000 60000.000000 60000.000000 60000.000000 60000.000000
mean 2.061707 1.677448 1.935544 0.635225 2.412940 0.984372 -3.026383 0.834455
std 4.491026 3.056146 1.956287 0.658155 1.602910 0.932878 10.023122 0.284409
min 0.034120 0.000383 0.000112 0.000839 0.048550 0.100003 -50.341116 0.280000
25% 0.344533 0.228764 0.566531 0.155102 1.463102 0.379476 -6.661925 0.600000
50% 0.693704 0.713193 1.606062 0.460673 2.086361 0.730599 -2.334339 0.800000
75% 1.619198 1.790019 2.705824 0.903497 2.905308 1.275189 1.273494 1.040000
max 74.699990 40.539430 27.995832 7.809622 22.693728 11.762206 51.561442 4.540000
Size of dataset:
train shape... (48000, 8) (48000,)
test shape.... (12000, 8) (12000,)
Generation 1 - Current best internal CV score: 0.9716458333333333
Generation 2 - Current best internal CV score: 0.9843125
Generation 3 - Current best internal CV score: 0.9847291666666667
Generation 4 - Current best internal CV score: 0.9859375
Generation 5 - Current best internal CV score: 0.9871041666666667
Generation 6 - Current best internal CV score: 0.9876875
Generation 7 - Current best internal CV score: 0.9876875
Generation 8 - Current best internal CV score: 0.9892291666666667
Generation 9 - Current best internal CV score: 0.9909791666666667
Generation 10 - Current best internal CV score: 0.9909791666666667
Best pipeline: KNeighborsClassifier(DecisionTreeClassifier(RandomForestClassifier(RFE(CombineDFs(input_matrix, input_matrix), criterion=gini, max_features=0.6500000000000001, n_estimators=100, step=0.1), bootstrap=False, criterion=gini, max_features=0.1, min_samples_leaf=3, min_samples_split=20, n_estimators=100), criterion=gini, max_depth=2, min_samples_leaf=9, min_samples_split=9), n_neighbors=6, p=2, weights=distance)
0.9926666666666667
Total compute time was: 01:23:05
I've never had good results with neural networks anyway. And yes, I've tried TabNet. TPOT beats TabNet every time. Charles
It seems to be an issue when templates are used in conjunction with config_dict='TPOT NN'
. When I run your code without a template it runs fine, and the error persists when I swap out your data for a different dataset.
I'll need to do some digging to figure out exactly what is going on, but there seem to be 2 possible contributing factors:
assert_all_finite()
with two arguments instead of one at: https://github.com/EpistasisLab/tpot/blob/6448bdb71ba08b4a0447c640d2f05a05e1affc21/tpot/builtins/nn.py#L163Hey,
Thanks for the update.
Charles
From: Joe Romano @.> Sent: Saturday, April 30, 2022 4:46 PM To: EpistasisLab/tpot @.> Cc: Charles Brauer @.>; Author @.> Subject: Re: [EpistasisLab/tpot] My dataet crashed TOP-NN (Issue #1247)
It seems to be an issue when templates are used in conjunction with config_dict='TPOT NN'. When I run your code without a template it runs fine, and the error persists when I swap out your data for a different dataset.
I'll need to do some digging to figure out exactly what is going on, but there seem to be 2 possible contributing factors:
— Reply to this email directly, view it on GitHub https://github.com/EpistasisLab/tpot/issues/1247#issuecomment-1114073334 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKBS4REXJNF65LQF7J545TVHXA3BANCNFSM5UTZPZOQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AAKBS4XJRAHY6WEP4Y5PUMDVHXA3BA5CNFSM5UTZPZO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOIJTWR5Q.gif Message ID: @. @.> >
Hey,
I am getting a crash in TOP-NN. My envirionment is:
I have put my code and dataset at: https://github.com/CBrauer/TPOT-NN-bug
The program is as follows:
After running a while, I get the following stack trace
I hope you guys can help me with this problem
Charles