Closed janweinreich closed 1 year ago
if you wish to include this, we should introduce different regression tasks, so there is no conflict between the KNN regressor and kernel-ridge regression
Hey... Looks neat (I just merged it). I think the trick to substantially improve performance will be in the encoding/representation of the input. But I still had no luck in beating plain old non-preprocessed SMILES (except for the kNN regression where there are slight improvements with tokenization).
added functions for kernel ridge regression with Laplacian kernel in
gzip_regressor.py
compute_pairwise_ncd
computed normalized compression densitycompute_ncd
enter multiprocessingtrain_kernel_ridge_regression
it trains...predict_kernel_ridge_regression1
well it predicts...For the datasets provided seems to perform about as well as KNN but I did not yet carefully try different hyperparameters.
Planing to test this for larger datasets such as QM9 as well as other types of molecular representations (e.g. binned numerical representations, or simply rdkit fingerprint, instead of string based representations). For the datasets benchmark here I expect Rdkit FP to be decent!