New Work : Model Implementation (Model -I)

eeshan15 commented 2 days ago

Download Datasets

Train Dataset Link : Click on this Link

Test Dataset Link : Click on this Link . Please Download both CSV files for test dataset

About The Model Selection

So in our model selection Process here we are with gradient Boosting models. Gradient Boosting models are really great when it comes to predict from the given data because they use learning rate method. Also There are Two types of gradient Boosting Models :

1) XGBoost : XGBoost builds new models to correct the errors made by previous models, specifically focusing on the gradients (errors) of the loss function. It uses gradient descent to minimize the errors. Each subsequent model (usually a decision tree) corrects the residuals or errors of the previous model.

2) AdaBoost : AdaBoost focuses on adjusting the weights of the training data points. Initially, it assigns equal weights to all data points. After each iteration, it increases the weights of the data points that were misclassified (or had higher error in regression). This forces the next model to pay more attention to the mistakes of the previous model. The final prediction is a weighted sum of the models’ outputs

In this Issue we are implementing XGBOOST Algorithim.

Implementation of XGBOOST for this issue

1) Take the training dataset and take 75:25 ratio convert 25% of training into Validation dataset. You can also use K-Fold Cross validation method.

2) Next Check for overfitting by Visualizing training and Validation loss by plotting them in one single graph Graph. If your Validation Curve is going up and Training curve going down then there is a problem you have to check for the issue. You can use Regularization techniques etc to solve the problem.

3) Take the learning rate close to 0.1 you can hit and trial as well.

4) Now here comes the main game , for test dataset introduce a column el_power_predicted and try predicting values learned by the model so that it can generate predicted values in that column.

5) Final Evaluation is done by R2 score (R-Squared Score) and RMSE (Root Mean Square Error). If your R2 score is close to 1 and RMSE score is lower. That means your model is good.

6) Please Provide a difference between Predicted values and actual Values by generating graphs such as Line plot or a scatter plot. The graph should clearly tells plot difference between two values and should be titled as "Predicted vs Actual Value"

Requirements for Pull Request (PR) Submission.

1) Your PR should have detailed Description , appropriate title and screen shots of code snippets and graphs 2) Also please tell the learning rate , R2 score and RMSE score separately in the form of heading in PR.

Checklists

[ ] Did Visualization
[ ] Showcased Code Snippets
[ ] Showcased learning Rate , R2 Score and RMSE score
[ ] Dealt with Overfitting (if persist)

Note : In your PR you can generate Checklist easily. Provided is the screen shot where the icon lies for automatic check list generation.

click on that button , and somewhat this is showcased "-[ ]" this is empty one so filled one will look like this "-[X]". You can preview it using preview button it will look something like this as showed in image

Please make sure you submit your PR with this checklists compulsorily.

akhilesh1709 commented 1 day ago

Please assign me this task

yuvam2005 commented 1 day ago

Please assign me this task

eeshan15 commented 1 day ago

I am also going to do solve this issue as well so only one more person can come to solve this issue. Only That PR will be accepted which has R2 Score closest to 1 and least possible RMSE score. Also it will be checked that does overfitting exists? So to your surprise and mentioned as well clearly you have to show a graph of training vs validation loss while submitting PR. its important. Assigned to both @yuvam2005 and @akhilesh1709. Kindly Proceed

eeshan15 commented 1 day ago

So Far here are the scores from akhilesh's PR : Learning Rate: The optimized learning rate found during hyperparameter tuning is 0.12538077692527183. Validation R² Score: 0.8409 Validation RMSE: 289.1317 Test R² Score: 0.8048 Test RMSE: 326.9103 Test MAE: 198.9039 Good work @akhilesh1709

shriyans2912 commented 19 hours ago

interested assign it to me

eeshan15 commented 19 hours ago

@shriyans2912 Assigned please proceed further as per instructed in issue's description

The-Red-Wood-Lab / Micro-Gas-Turbine-electrical-energy-Prediction