MStarmans91 / WORC

Workflow for Optimal Radiomics Classification
Other
66 stars 19 forks source link

[BUG] No fitting ensemble found in function create_ensemble #86

Closed xinyiwan closed 3 months ago

xinyiwan commented 9 months ago

Describe the bug I used 4 clinical paramters as semantic features in WORC. They are age, gender, pain and motor symptoms. Except age, the other features are categorical, e.g. 0,1,2.

The workflow managed to create the hdf5 file in classification, but it failed to find a fitted ensamble. In the temporary files tmp/{exp_name}/classify/all/classification_0.hdf5exists. I tried to use the hdf5 file to build ensemble again for the furthur analysis, but when running create_ensemble the process failed.

One index out of range error found in WORC/classification/SearchCV.py, line 1491 and 1492. If there is no fitted ensemble found during the whole search, enum adds to 100 and causes index out of range error in line 1492.

Now the problem is if I fix the index issue, there is still no fitted ensemble. Can I conclude that for the four features I used, there is no fitting model among the models used in WORC.

WORC configuration In the config file, only sematic features are used in the feature selection.

Expected behavior Report the fitting fails for the chosen features.

Desktop (please complete the following information):

Additional context No

MStarmans91 commented 9 months ago

Can you add the WORC configuration file, so I can check whether nothing goes wrong there?

xinyiwan commented 9 months ago
[General]
cross_validation = True
segmentix = True
featurecalculators = [predict/CalcFeatures:1.0, pyradiomics/Pyradiomics:1.0]
preprocessing = worc/PreProcess:1.0
registrationnode = elastix4.8/Elastix:4.8
transformationnode = elastix4.8/Transformix:4.8
joblib_ncores = 1
joblib_backend = threading
tempsave = True
assumesameimageandmaskmetadata = True
combat = False
fingerprint = True
dotestnrsnens = False

[Fingerprinting]
max_num_image = 100

[Labels]
label_names = MPNST
modus = singlelabel
url = WIP
projectid = WIP

[Preprocessing]
checkspacing = False
clipping = False
clipping_range = -1000.0, 3000.0
normalize = True
normalize_roi = Full
method = z_score
roidetermine = Provided
roidilate = False
roidilateradius = 10
resampling = False
resampling_spacing = 1, 1, 1
biascorrection = False
biascorrection_mask = False
checkorientation = False
orientationprimaryaxis = axial
histogramequalization = False
histogramequalization_alpha = 0.3
histogramequalization_beta = 0.3
histogramequalization_radius = 5

[Segmentix]
mask = None
segtype = None
segradius = 5
n_blobs = 1
fillholes = True
remove_small_objects = False
min_object_size = 2

[ImageFeatures]
shape = True
histogram = True
orientation = True
texture_gabor = True
texture_lbp = True
texture_glcm = True
texture_glcmms = True
texture_glrlm = False
texture_glszm = False
texture_ngtdm = False
coliage = False
vessel = True
log = True
phase = True
image_type = MRI
extraction_mode = 2.5D
gabor_frequencies = 0.05, 0.2, 0.5
gabor_angles = 0, 45, 90, 135
glcm_angles = 0, 0.79, 1.57, 2.36
glcm_levels = 16
glcm_distances = 1, 3
lbp_radius = 3, 8, 15
lbp_npoints = 12, 24, 36
phase_minwavelength = 3
phase_nscale = 5
log_sigma = 1, 5, 10
vessel_scale_range = 1, 10
vessel_scale_step = 2
vessel_radius = 5
dicom_feature_tags = 0010 1010, 0010 0040
dicom_feature_labels = age, sex

[PyRadiomics]
geometrytolerance = 0.0001
normalize = False
normalizescale = 100
resampledpixelspacing = None
interpolator = sitkBSpline
precrop = True
bincount = 16
binwidth = None
force2d = False
force2ddimension = 0
voxelarrayshift = 300
original = True
wavelet = False
log = False
label = 1
extract_firstorder = False
extract_shape = True
texture_glcm = False
texture_glrlm = True
texture_glszm = True
texture_gldm = True
texture_ngtdm = True

[ComBat]
language = python
batch = Hospital
mod = []
par = 1
eb = 1
per_feature = 0
excluded_features = sf_, of_, semf_, pf_
matlab = C:\Program Files\MATLAB\R2015b\bin\matlab.exe

[OneHotEncoding]
use = False
feature_labels_tofit = 

[Imputation]
use = True
strategy = mean, median, most_frequent, constant, knn
n_neighbors = 5, 5
skipallnan = True

[FeatureScaling]
scaling_method = robust_z_score
skip_features = semf_, pf_

[FeatPreProcess]
use = False
combine = False
combine_method = mean

[Featsel]
variance = 1.0
groupwisesearch = True
selectfrommodel = 0.275
selectfrommodel_estimator = Lasso, LR, RF
selectfrommodel_lasso_alpha = 0.1, 1.4
selectfrommodel_n_trees = 10, 90
usepca = 0.275
pcatype = 95variance, 10, 50, 100
statisticaltestuse = 0.275
statisticaltestmetric = MannWhitneyU
statisticaltestthreshold = -3, 2.5
reliefuse = 0.275
reliefnn = 2, 4
reliefsamplesize = 0.75, 0.2
reliefdistancep = 1, 3
reliefnumfeatures = 10, 40
rfe = 0.0
rfe_estimator = Lasso, LR, RF
rfe_lasso_alpha = 0.1, 1.4
rfe_n_trees = 10, 90
rfe_n_features_to_select = 10, 90
rfe_step = 1, 9

[SelectFeatGroup]
shape_features = False
histogram_features = False
orientation_features = False
texture_gabor_features = False
texture_glcm_features = False
texture_gldm_features = False
texture_glcmms_features = False
texture_glrlm_features = False
texture_glszm_features = False
texture_gldzm_features = False
texture_ngtdm_features = False
texture_ngldm_features = False
texture_lbp_features = False
dicom_features = False
semantic_features = True
coliage_features = False
vessel_features = False
phase_features = False
fractal_features = False
location_features = False
rgrd_features = False
toolbox = All, PREDICT, PyRadiomics
original_features = False
wavelet_features = False
log_features = False

[Resampling]
use = 0.20
method = RandomUnderSampling, RandomOverSampling, NearMiss, NeighbourhoodCleaningRule, ADASYN, BorderlineSMOTE, SMOTE, SMOTEENN, SMOTETomek
sampling_strategy = auto, majority, minority, not minority, not majority, all
n_neighbors = 3, 12
k_neighbors = 5, 15
threshold_cleaning = 0.25, 0.5

[Classification]
fastr = True
fastr_plugin = DRMAAExecution
classifiers = SVM, RF, LR, LDA, QDA, GaussianNB, AdaBoostClassifier, XGBClassifier
max_iter = 100000
svmkernel = linear, poly, rbf
svmc = 0, 6
svmdegree = 1, 6
svmcoef0 = 0, 1
svmgamma = -5, 5
rfn_estimators = 10, 90
rfmin_samples_split = 2, 3
rfmax_depth = 5, 5
lrpenalty = l1, l2, elasticnet
lrc = 0.01, 0.99
lr_solver = lbfgs, saga
lr_l1_ratio = 0, 1
lda_solver = svd, lsqr, eigen
lda_shrinkage = -5, 5
qda_reg_param = -5, 5
elasticnet_alpha = -5, 5
elasticnet_l1_ratio = 0, 1
sgd_alpha = -5, 5
sgd_l1_ratio = 0, 1
sgd_loss = squared_loss, huber, epsilon_insensitive, squared_epsilon_insensitive
sgd_penalty = none, l2, l1
cnb_alpha = 0, 1
adaboost_n_estimators = 10, 90
adaboost_learning_rate = 0.01, 0.99
xgb_boosting_rounds = 10, 90
xgb_max_depth = 3, 12
xgb_learning_rate = 0.01, 0.99
xgb_gamma = 0.01, 9.99
xgb_min_child_weight = 1, 6
xgb_colsample_bytree = 0.3, 0.7
lightgbm_num_leaves = 5, 95
lightgbm_max_depth = 3, 12
lightgbm_min_child_samples = 5, 45
lightgbm_reg_alpha = 0.01, 0.99
lightgbm_reg_lambda = 0.01, 0.99
lightgbm_min_child_weight = -7, 4

[CrossValidation]
type = random_split
n_iterations = 100
test_size = 0.2
fixed_seed = False

[HyperOptimization]
scoring_method = f1_weighted
test_size = 0.2
n_splits = 5
n_iterations = 1000
n_jobspercore = 200
maxlen = 100
ranking_score = test_score
memory = 3G
refit_training_workflows = False
refit_validation_workflows = False
fix_random_seed = False

[SMAC]
use = False
n_smac_cores = 1
budget_type = evals
budget = 100
init_method = random
init_budget = 20

[Ensemble]
method = top_N
size = 100
metric = Default

[Evaluation]
overfitscaler = False

[Bootstrap]
use = False
n_iterations = 10000
MStarmans91 commented 3 months ago

Solced: We found the issue after debugging: you have to select one of these three feature groups, otherwise zero features are selected in all of the workflows, hence all fail, hence your ensemble is always empty.

original_features = False
wavelet_features = False
log_features = False

Found when running the plot_Estimator job in the fastr network with the --verbose True flag.

FYI: these settings do not select feature groups, but whether to apply a filter before feature extraction (wavelet or log) or compute the features on the original image. You can select multiple or all options.