Open Shafi2016 opened 4 years ago
Try plotting the data to confirm there is a distribution. Perhaps there is not.
If there is, try changing the number of bins in the histogram plot.
Thanks a lot: Yes tried to change the number of bins but it did not work as:
sns.distplot(stats, hist=True, kde=False, bins=int(30/2), color = 'blue', hist_kws={'edgecolor':'black'})
I checked with XGBoost Classifier with the data (https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv). It works fine.
import numpy from pandas import read_csv from sklearn.utils import resample from xgboost import XGBClassifier from sklearn.metrics import accuracy_score from matplotlib import pyplot
data = read_csv('pima-indians-diabetes.data.csv', header=None) values = data.values
n_iterations = 100 n_size = int(len(data) * 0.50)
stats = list() for i in range(n_iterations):
train = resample(values, n_samples=n_size)
test = numpy.array([x for x in values if x.tolist() not in train.tolist()])
# fit model
model =XGBClassifier()
model.fit(train[:,:-1], train[:,-1])
# evaluate model
predictions = model.predict(test[:,:-1])
score = accuracy_score(test[:,-1], predictions)
print(score)
stats.append(score)
pyplot.hist(stats) pyplot.show()
alpha = 0.95 p = ((1.0-alpha)/2.0) 100 lower = max(0.0, numpy.percentile(stats, p)) p = (alpha+((1.0-alpha)/2.0)) 100 upper = min(1.0, numpy.percentile(stats, p)) print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha100, lower100, upper*100))
I also ploted the histogram of Prediction (XGBoost regression) It seems fine:
Hi, I have this error for classifier 'continuous is not supported' How can I solve it ?
Hi Dmlc/Xgboost,
Thanks for asking.
I’m eager to help, but I just don’t have the capacity to debug code for you.
I am happy to make some suggestions:
Regards,
Jason Brownlee, Ph.D. Making Developers Awesome at Machine Learning
Do you need help with machine learning? Visit: MachineLearningMastery.com http://machinelearningmastery.com/
On Mon, May 16, 2022 at 5:41 AM yahmadyar95 @.***> wrote:
Hi, I have this error for classifier 'continuous is not supported' How can I solve it ?
— Reply to this email directly, view it on GitHub https://github.com/dmlc/xgboost/issues/5475#issuecomment-1127452785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADEWZDJTLGX3WOFWRDHW3VKIJ3FANCNFSM4L2P7RMQ . You are receiving this because you commented.Message ID: @.***>
I want to construct Bootstrap Confidence Intervals for XGBoost regression using python. I developed my case based on codes (https://machinelearningmastery.com/calculate-bootstrap-confidence-intervals-machine-learning-results-python/#comment-528118). Question: I am getting a one bin histogram. I get the single value for the score when we do n_iterations for the bootstrap. This is the problem and it is related to the way I am getting RMSE. Though I tried to find RMSE in different ways. yet, I could not solve the problem How can we solve it?
import numpy from pandas import read_csv from sklearn.datasets import load_boston from sklearn.utils import resample from matplotlib import pyplot from xgboost import XGBRegressor import pandas as pd import numpy as np from sklearn.metrics import mean_squared_error
load dataset
boston_dataset = load_boston()
df = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)
df['MEDV'] = boston_dataset.target values1 = df.values
configure bootstrap
n_iterations = 1000 n_size = int(len(df) * 0.50)
run bootstrap
stats = list()
for i in range(n_iterations):
prepare train and test sets
model = XGBRegressor() ## Final for the papers
X_train = train[:,:-1] y_train = train[:,-1] X_test = test[:,:-1] y_test = test[:,-1]
model.fit(X_train,y_train) predictions = model.predict(X_test)
make predictions
def rmse_calculator(predicted, actual):
score = mean_squared_error(y_test, predictions) ** 0.5
yt = np.asarray(y_test) y_pred = np.asarray(predictions) score = np.sqrt(mean_squared_error(yt,y_pred)) print(score) stats.append(score)
plot scores
pyplot.hist(stats) pyplot.show()
confidence intervals
alpha = 0.95 p = ((1.0-alpha)/2.0) 100 lower = max(0.0, numpy.percentile(stats, p)) p = (alpha+((1.0-alpha)/2.0)) 100 upper = min(1.0, numpy.percentile(stats, p)) print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha100, lower100, upper*100))