Qoala-T / QC

Qoala-T is a supervised-learning tool for quality control of FreeSurfer segmented MRI data
Other
72 stars 15 forks source link

Question about Qoala_T_B_subset_based_github.R #20

Closed FrancescaPentimalli closed 4 years ago

FrancescaPentimalli commented 5 years ago

Hi all,

I am trying to run the: Qoala_T_B_subset_based_github.R with my data but I got the following error:

Error:Class probabilities are needed to score models using the area under the ROC curve. Set classProbs_ = TRUE` in the trainControl() function.

This is the error after this code

rf.tune = train(y=training$Rating, x=subset(training, select=-c(Rating)), method = "rf", metric = "ROC", trContr1ol = ctrl, ntree = 501, tuneGrid=expand.grid(mtry = c(8)), verbose=FALSE)

Can anybody help me to fix it? Thank you very much!!

eduardklap commented 5 years ago

I think the problem might be in the lines 92-98 before rf.tune: There classProbs = TRUE should be declared:

ctrl = trainControl(method = 'repeatedcv', number = 2, repeats = 10, summaryFunction=twoClassSummary, classProbs=TRUE, allowParallel=FALSE, sampling="rose")

Can you double check that this ctrl function has run properly? Hope this helps! Let me know if you run into other problems.

FrancescaPentimalli commented 5 years ago

Hi Eduard, thank you very much for your help! I double checked and this ctrl function is running properly. Could my issue be due to another thing?

eduardklap commented 5 years ago

Could you double check the level names of the Rating variable in your dataset? Or maybe post a part of your dataset here? This error might be due to problems with the outcome names (e.g., https://stackoverflow.com/questions/43039494/error-with-class-probabilities-using-r-caret) e.g., str(dataset$Rating) should say something like: Factor w/ 2 levels "Exclude","Include": 1 2 2 1 2 2 2 2 2 1 ...

Best, Eduard

FrancescaPentimalli commented 5 years ago

Our factors are in a 1-4 scale, how did you get the "Exclude" and "Include"? Can you tell me please where in the script "Factor w/ 2 levels "Exclude","Include": 1 2 2 1 2 2 2 2 2 1" this is generated?

eduardklap commented 5 years ago

Yes that is the reason for the error, it has to do with the way the input data should look.

You should first rescale your ratings into the two factor levels 'Include' and 'Exclude'. So 1-3 become Include and 4 (failed) becomes Exclude. See https://github.com/Qoala-T/QC/blob/master/simulated_data_B_subset.Rdata for how the datafile should look.

Thanks for noticing this, I see that this should be made clearer in the instructions. Will update this.

FrancescaPentimalli commented 5 years ago

Screen Shot 2019-08-21 at 4 08 39 PM

Dear Eduard, thank you very much for the clarification! I still have the same error.

Here you can find a screenshot of my dataset:

Screen Shot 2019-08-21 at 4 18 36 PM

What do you think?

eduardklap commented 5 years ago

Thanks for the screenshots! The Rating variable looks good now (note however that you have relatively few Exclude cases, which might impact the Qoala-T predictions).

I am not sure what might cause the error now, could you try to run the script with the simulated data? See the part # Or Load example with simulated data in the original script lines 54-62. This way you can see if the problem is related to your R environment, or something related to the data.

FrancescaPentimalli commented 5 years ago

Hi Eduard, The script with the stimulated data is still not working. If is a problem related to my R environment, how can I fix it? Thank you very much for your help!!

eduardklap commented 5 years ago

In that case it might be caused by differences in R / R packages versions. Can you check if you have an R version > 3.5 and caret package > caret_6.0-81? You can use sessionInfo() to check this.