fabsig / GPBoost

Combining tree-boosting with Gaussian process and mixed effects models
Other
574 stars 46 forks source link

Raise an error when input parameters are misspecified #11

Closed poroc300 closed 3 years ago

poroc300 commented 3 years ago

Hi. First of all thank you for putting time and effort in developing such an interesting tool.

The method train of a GPModel instance does not recognize incorrect names for parameters. For example, when you define the dictionary with parameter values like this:

params = {'num_boost_round': 20000, 'xxxxx': 0.5}

The train method just ignores "xxxxx" and proceeds with training. I think it would be useful to raise a warning or an error to facilitate debugging. For example, the other day I specified the learning rate with a value of 0.5. Unfortunately, there was a typo that slipped under the radar:

params = {"learning_rate:": 0.5}

Note the extra colon ":" in the parameter's name. Because this typo renders the parameter name as invalid, the algorithm just ignores it and assumes a default value for the learning rate (which I believe is 0.1). It took me a while to find out why the algorithm was running but not performing as expected (I knew a priori that with a learning rate of 0.5 the results were good). Thank you for your attention.

fabsig commented 3 years ago

Thanks for your feedback.

I am getting a warning (gpboost version 0.2.3): [GPBoost] [Warning] Unknown parameter: learning_rate:

I assume that you use the Python package and refer to the gpb.train() function.

Can you provide a minimal working example to reproduce your issue?

poroc300 commented 3 years ago

Please find below a snippet of code. Following your example, this code should generate a warning because the parameter "xxxx" does not exist. However, my machine does not output any warning and runs the algorithm as usual.

import numpy as np
import pandas as pd
import gpboost as gpb

#simulate data
data = {"feat1": np.random.normal(60, 10, size=3000),
        "feat2": np.random.normal(40, 7, size=3000),
        "group": np.random.randint(1, 4, size=3000),
        "target": np.random.normal(200, 20, size=3000)}
data = pd.DataFrame(data)

#create input objects
X = data.iloc[:, :2].copy()
clusters = data["group"].copy().to_numpy()
y = data["target"].copy()

#fit model
params = {"learning_rate": 0.5, "xxxx": 1}
gpb_data = gpb.Dataset(X, y)
gp_model = gpb.GPModel(group_data=clusters)
fitted = gpb.train(params=params, train_set=gpb_data, gp_model=gp_model)

I am using Spyder 4.1.4 (embedded in Anaconda) and Python 3.8.3 on Windows 10.

fabsig commented 3 years ago

In Spyder, I am also not getting a warning. However, when using PyCharm, I get the Unknown parameter warning.

This has something to do with the LightGBM version (0.2.3) that GPBoost is relying on. Spyder does also not correctly show warnings or notes for LightGBM version 0.2.3. For newer versions of LightGBM, this is not an issue anymore. I plan to update to the newest version LightGBM soon, and the problem will very likely be fixed then. In the meantime, I suggest that you use another IDE instead of Spyder.

poroc300 commented 3 years ago

Thank you for taking the time to address my inquiry.

fabsig commented 3 years ago

This has been fixed now. Starting with the new version 0.4.0 of GPBoost, all error messages and warnings are now correctly displayed also in Spyder.

poroc300 commented 3 years ago

Nice, thanks :)