Open DhanshreeA opened 1 year ago
Hi @GemmaTuron I ran the experimentation with TDC CYP2C9 Veith benchmark dataset again and the system works. Here are a couple of observations that also address your concern around predict being stuck:
for data_sample in data_ensembles:
feat_ensembles = get_feat_ensembles(data_sample)
for feat_sample in feat_ensembles:
x, y = feat_sample
tabpfn.fit(x,y)
tabpfn.predict(x_test)
This is currently leading to longer run times especially with a high value of max_iters
which is the input that configures number of data ensembles to EnsembleTabPFN.
One of the possibilities that @miquelduranfrigola and I had considered was to use some heuristic or apriori information to not make use of all feature ensembles, or incorporate some sort of early stopping strategy. I'll get to that soon.
For now, you can use it by cloning it and directly doing a pip install. I'll update it on PyPI soon.
I'll use this issue to track any problems you have if any while running the library with the latest updates and if you are able to reproduce the results, I'll close this issue. @GemmaTuron
Hi @GemmaTuron could you get a chance to test this?
Hi @DhanshreeA !
Sorry for the delayed response. I've followed your suggestion and installed the package from the repo directly Using a dataset of 10k and the current predetermined parameters. I am also not sure to which function I need to pass the max_iter parameters, can you add this to the README?
I am simply aiming at running this:
from ensemble_tabpfn import EnsembleTabPFN
clf = EnsembleTabPFN()
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
Where X is a list of SMILES with only one associated Activity
I mean, I can leave it running if you think that will help, but I am concerned that fitting is taking only 1 second
@DhanshreeA I still get inconsistencies in the times needed to train and predict with ChemPFN
Can you give me an update of what are the expected times with a training set of 10K and a prediction set of 1K for example?
Copying over conversation from Slack, issue raised by @GemmaTuron:
Investigate why the module gets stuck during predict, and reproduce the TDC results generated earlier (refer notebook)