Open avinashbarnwal opened 4 years ago
I am not an expert on xgboost hyper-parameters. I thought you already discussed this with @hcho?
We discussed but didn't realize it would take this much time in R.
Please double check who you are mentioning.
On Wed, Oct 30, 2019 at 6:14 PM Toby Dylan Hocking notifications@github.com wrote:
I am not an expert on xgboost hyper-parameters. I thought you already discussed this with @hcho https://github.com/hcho?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/avinashbarnwal/aftXgboostPaper/issues/4?email_source=notifications&email_token=AB2OWEPGQFQSOTNTNCTJ65TQRIBNLA5CNFSM4JHAPFM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECV6FKI#issuecomment-548135593, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWEN5IGI4F57LKVLPMFTQRIBNLANCNFSM4JHAPFMQ .
@avinashbarnwal Can you be more specific? How long is each combination taking?
Hi @hcho3,
It is taking ~42 secs for running one iteration.
Please find the code here for one iteration - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/src/R/production/xgboost/xgboost_hyper.ipynb
Hi @hcho3, As we discussed, I am porting the code from R to python for hyper-parameter tuning. As grid-search is very slow in R and support for packages like optuna is not there for R.
I also granted @avinashbarnwal access to a fast machine.
Thanks @hcho3.
Hi Prof. @tdhock and @hcho3 ,
I have the results for intervalCV , survival regression and xgboost.
https://github.com/avinashbarnwal/aftXgboostPaper/tree/master/result/ATAC_JV_adipose
Please let me know your thoughts.
did you double-check the predictions/accuracy of IntervalRegressionCV using my precomputed files, as mentioned in #2 ?
also did you run it for all data sets or just the ATAC data set?
also are the 1.csv 2.csv etc files predictions? if so they should have a column for sequenceID so we can compute accuracy metrics. please add one.
also did you run it for all data sets or just the ATAC data set?
I have done it only for ATAC data set.
also are the 1.csv 2.csv etc files predictions? if so they should have a column for sequenceID so we can compute accuracy metrics. please add one.
Done. Check the particular file - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/result/ATAC_JV_adipose/intervalCV/%201%20.csv
did you double-check the predictions/accuracy of IntervalRegressionCV using my precomputed files, as mentioned in #2 ?
I don't think this is test fold rather cross-validation folders. Please check on the last of the notebook. - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/src/R/production/penaltyLearning/intervalCV.ipynb
I think we need to test the test-folds, not the cross-validation folds.
good that you store sequenceIDs in prediction files now.
also what distribution did you use? I think you should compute predictions files for all distributions, and for all test folds.
for my predictions.csv files there is one row for each sequenceID in the test set. Yours should be too. It looks like your predictions are for different sequenceIDs. You should double-check your code.
This is based on the best results of cross-validation folds. Now I am considering distributions as hyper-parameter.
I am making the folds based on the folds data folder. I would like to have a quick call to weed out this hiccup.
Hi @hcho3 and Prof. @tdhock
We have 18000 combinations for hyperparameter tuning. Please find this code. Link - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/src/R/production/xgboost/xgboost_hyper.ipynb
I am looking to optimize this. Please let me know if you have any ideas to make it fast.