Closed tdhock closed 4 years ago
predictions are on log scale. please make scatterplot (your pred vs this pred) to examine if your predictions are consistent.
The last column is the predicted column. (using the full training set, excluding the given test fold)
Hi Prof. Toby,
I am still not sure how https://github.com/tdhock/neuroblastoma-data/blob/master/demo-folds.R will lead to the creation of a test fold. I will explain my thinking here about the prediction file -
as you mentioned that in the https://github.com/avinashbarnwal/aftXgboostPaper/issues/2.
Testfolds - https://github.com/tdhock/neuroblastoma-data/tree/master/data/ATAC_JV_adipose/cv/equal_labels/testFolds First Fold when we keep 1 Test fold Test_Fold1-CrossValidation.png
Above are the training cross-validations not the test folds. Ideally, we should have a test fold differently compared to cross-validation folds.
@avinashbarnwal I don't understand. Can you please repeat/rephrase your question?
Prof. @tdhock,
I am not able to locate the test fold for corresponding train folds. In the above pic, we only have corresponding validation fold created using training fold. Please let me know if it is still not clear.
still not clear. can you repeat/rephrase in terms of a question?
the testFold/XXX path indicates the test fold ID = XXX
there are randomTrainOrderings/YYY directories but the YYY = random seed for ordering the training samples, which is irrelevant to your work.
Validation folds are not defined in those data files (you need to do that yourself in your code)
Hi Prof. @tdhock ,
I hope i am clear this time.
the testFold/XXX path indicates the test fold ID = XXX. In this which file to choose because we have testFold/XXX/randomTrainOrderings ?
use randomTrainOrderings/YYY where YYY = anything (there should be no difference as long as you are looking at the last column of the predictions.csv files)
On Tue, Nov 12, 2019 at 3:35 PM Avinash Barnwal notifications@github.com wrote:
Hi Prof. @tdhock https://github.com/tdhock ,
I hope i am clear this time. [image: test train] https://user-images.githubusercontent.com/6061417/68716564-82f92c80-0572-11ea-85ad-072f032afeb4.jpg
the testFold/XXX path indicates the test fold ID = XXX. In this which file to choose because we have testFold/XXX/randomTrainOrderings ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/avinashbarnwal/aftXgboostPaper/issues/2?email_source=notifications&email_token=AAHDX4Q65FWJ65CFMMT5WODQTMVRNA5CNFSM4JE3T7VKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED4GCPQ#issuecomment-553148734, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDX4RUZD43I3FREK5CJ4TQTMVRNANCNFSM4JE3T7VA .
Hi Prof. @tdhock,
I am able to fix the code. Now the results, we are getting is the same as the one you have created for predictions.
Results:- Test Fold 1 - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/result/ATAC_JV_adipose/intervalCV/compare_benchmark_1.png Test Fold 2 - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/result/ATAC_JV_adipose/intervalCV/compare_benchmark_2.png Test Fold 3 - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/result/ATAC_JV_adipose/intervalCV/compare_benchmark_3.png Test Fold 4 - https://github.com/avinashbarnwal/aftXgboostPaper/blob/master/result/ATAC_JV_adipose/intervalCV/compare_benchmark_4.png
ok very good
Hi @avinashbarnwal if you computed the predictions correctly using IntervalRegressionCV then they should be the same as the last column in these files, https://github.com/tdhock/neuroblastoma-data/blob/master/data/ATAC_JV_adipose/cv/equal_labels/testFolds/1/randomTrainOrderings/1/models/L1reg_linear_all/predictions.csv