avinashbarnwal / AFTXGBoostPaper

AFT XGBOOST
6 stars 2 forks source link

double check IntervalRegressionCV predictions #2

Closed tdhock closed 4 years ago

tdhock commented 4 years ago

Hi @avinashbarnwal if you computed the predictions correctly using IntervalRegressionCV then they should be the same as the last column in these files, https://github.com/tdhock/neuroblastoma-data/blob/master/data/ATAC_JV_adipose/cv/equal_labels/testFolds/1/randomTrainOrderings/1/models/L1reg_linear_all/predictions.csv

tdhock commented 4 years ago

predictions are on log scale. please make scatterplot (your pred vs this pred) to examine if your predictions are consistent.

avinashbarnwal commented 4 years ago

The last column is the predicted column. (using the full training set, excluding the given test fold)

avinashbarnwal commented 4 years ago

Hi Prof. Toby,

I am still not sure how https://github.com/tdhock/neuroblastoma-data/blob/master/demo-folds.R will lead to the creation of a test fold. I will explain my thinking here about the prediction file -

https://github.com/tdhock/neuroblastoma-data/blob/master/data/ATAC_JV_adipose/cv/equal_labels/testFolds/1/randomTrainOrderings/1/models/L1reg_linear_all/predictions.csv

as you mentioned that in the https://github.com/avinashbarnwal/aftXgboostPaper/issues/2.

Testfolds - https://github.com/tdhock/neuroblastoma-data/tree/master/data/ATAC_JV_adipose/cv/equal_labels/testFolds First Fold when we keep 1 Test fold Test_Fold1-CrossValidation.png

Test_Fold1-CrossValidation

Above are the training cross-validations not the test folds. Ideally, we should have a test fold differently compared to cross-validation folds.

tdhock commented 4 years ago

@avinashbarnwal I don't understand. Can you please repeat/rephrase your question?

avinashbarnwal commented 4 years ago

Prof. @tdhock,

I am not able to locate the test fold for corresponding train folds. In the above pic, we only have corresponding validation fold created using training fold. Please let me know if it is still not clear.

tdhock commented 4 years ago

still not clear. can you repeat/rephrase in terms of a question?

the testFold/XXX path indicates the test fold ID = XXX

there are randomTrainOrderings/YYY directories but the YYY = random seed for ordering the training samples, which is irrelevant to your work.

Validation folds are not defined in those data files (you need to do that yourself in your code)

avinashbarnwal commented 4 years ago

Hi Prof. @tdhock ,

I hope i am clear this time. test train

the testFold/XXX path indicates the test fold ID = XXX. In this which file to choose because we have testFold/XXX/randomTrainOrderings ?

tdhock commented 4 years ago

use randomTrainOrderings/YYY where YYY = anything (there should be no difference as long as you are looking at the last column of the predictions.csv files)

On Tue, Nov 12, 2019 at 3:35 PM Avinash Barnwal notifications@github.com wrote:

Hi Prof. @tdhock https://github.com/tdhock ,

I hope i am clear this time. [image: test train] https://user-images.githubusercontent.com/6061417/68716564-82f92c80-0572-11ea-85ad-072f032afeb4.jpg

the testFold/XXX path indicates the test fold ID = XXX. In this which file to choose because we have testFold/XXX/randomTrainOrderings ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/avinashbarnwal/aftXgboostPaper/issues/2?email_source=notifications&email_token=AAHDX4Q65FWJ65CFMMT5WODQTMVRNA5CNFSM4JE3T7VKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED4GCPQ#issuecomment-553148734, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDX4RUZD43I3FREK5CJ4TQTMVRNANCNFSM4JE3T7VA .

avinashbarnwal commented 4 years ago

Hi Prof. @tdhock,

I am able to fix the code. Now the results, we are getting is the same as the one you have created for predictions.

Results:- Test Fold 1 - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/result/ATAC_JV_adipose/intervalCV/compare_benchmark_1.png Test Fold 2 - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/result/ATAC_JV_adipose/intervalCV/compare_benchmark_2.png Test Fold 3 - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/result/ATAC_JV_adipose/intervalCV/compare_benchmark_3.png Test Fold 4 - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/result/ATAC_JV_adipose/intervalCV/compare_benchmark_4.png

tdhock commented 4 years ago

ok very good