Closed mojgan-ph closed 4 years ago
Hi mojgan-ph,
Can you please let me know what is the data type and dimensions for the input variable X_?
Thanks.
It is a pandas data frame of shape (412989, 17) The info for this data frame is as follows:
Int64Index: 412989 entries, 1754275 to 5058762 Data columns (total 17 columns):
0 age-high-bp-diagnosed 412989 non-null float64
1 average-dias-0 412989 non-null float64
2 average-sys-0 412989 non-null float64
3 average-pulse-0 412989 non-null float64
4 history-of-diabetes 412989 non-null bool
5 gender 412989 non-null int64
6 age-0 412989 non-null float64
7 hypertention-medication-0 412989 non-null bool
8 mother-smoker 412989 non-null float64
9 smoker 412989 non-null bool
10 ex-smoker 412989 non-null bool
11 non-smoker 412989 non-null bool
12 amount-combined 412989 non-null float64
13 ex-penalty 412989 non-null float64
14 average-BMI-0 412989 non-null float64
15 diff-age-and-agehighbpdiagnosed 412989 non-null float64
16 diff-blood-pressures 412989 non-null float64
dtypes: bool(5), float64(11), int64(1)
memory usage: 42.9 MB
Thanks Mojgan. I think this issue is caused by the R wrapper on top of the missForest algorithm. I recommend you do imputation externally using any imputer (e.g. MICE) and then apply AP while turning the imputation option off. I will further investigate this bug and fix it in the next update.
Can you please let me know how to turn the imputation option off? Is there a manual for Autoprognosis that I can read?
You can set is_nan=False in the instantiation of the AutoPrognosis_Classifier object.
Thank you :)
Hi Ahmed, It is more than 4 hours that Autoprognosis is running. I am wondering can that be normal? Is there a way I can see the progress? Is there an option for verbose logging? It only printed the following shortly after the start of the run:
I have made the classifier object like what you have done in the tutorial, adding is_nan=False: AP_mdl = model.AutoPrognosis_Classifier( metric=metric, CV=5, num_iter=3, kernel_freq=100, ensemble=True, ensemble_size=3, Gibbs_iter=100, burn_in=50, num_components=3, acquisition_type=acquisition_type, is_nan=False)
I also need some help to understand the parameters that the classifier constructor needs. In your paper I can see that you set AutoPrognosis to conduct 200 iterations of the Bayesian optimization procedure. Is that set by num_iter, or Gibbs_iter? what are kernel_freq, num_components and burn_in?
Best, Mojgan
Hi Mojgan,
Based on the size of your data set, your experiment will likely need to run for multiple days of you use a large number of iterations (num_iter). You may speed up the process by reducing the number of iterations. However, if you are keeping num_iter to be 3 then your experiment should probably be done within one day.
I am not sure which paper you are referring to, but in all my medical papers I was using a very different earlier version of this algorithm that had different parameters do not match here. But you can consider the number of iterations of the Bayesian optimization procedure to be the num_iter parameter.
Thanks.
Hi Mojgan,
Based on the size of your data set, your experiment will likely need to run for multiple days if you use a large number of iterations (num_iter). You may speed up the process by reducing the number of iterations. However, if you are keeping num_iter to be 3 then your experiment should probably be done within one day.
I am not sure which paper you are referring to, but in all my medical papers I was using a very different earlier version of this algorithm that had different parameters, so parameters settings do not necessarily match with how they are defined in this version. But you can consider the number of iterations of the Bayesian optimization procedure to be the num_iter parameter.
Thanks.
Ahmed
From: mojgan-ph notifications@github.com Sent: Tuesday, May 5, 2020 9:39 PM To: ahmedmalaa/AutoPrognosis AutoPrognosis@noreply.github.com Cc: Ahmed M. Alaa a7med3laa@hotmail.com; Comment comment@noreply.github.com Subject: Re: [ahmedmalaa/AutoPrognosis] R runtime error: dim(X) must have a positive length (#6)
Hi Ahmed, It is almost 4 hours that Autoprognosis is running. I am wondering can that be normal? I have made the classifier object like what you have done in the tutorial, adding is_nan=False: AP_mdl = model.AutoPrognosis_Classifier( metric=metric, CV=5, num_iter=3, kernel_freq=100, ensemble=True, ensemble_size=3, Gibbs_iter=100, burn_in=50, num_components=3, acquisition_type=acquisition_type, is_nan=False)
I also need some help to understand the parameters that the classifier constructor needs. In your paper I can see that you set AutoPrognosis to conduct 200 iterations of the Bayesian optimization procedure. Is that set by num_iter, or Gibbs_iter? what are kernel_freq, num_components and burn_in?
Best, Mojgan
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ahmedmalaa/AutoPrognosis/issues/6#issuecomment-624437565, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFBNR5SXWPVCV6DBYERZZR3RQDSXHANCNFSM4MYP4ZAA.
Thank you for the clarification.
Best, Mojgan
Hi Ahmed,
In order to do faster runs of the tool, I have made some changes to AutoPrognosis for myself, to only include a few classification algorithms. I have forked your repo. It is not a stable version yet. I would appreciate any design/usage documents if you have any, to help me with my changes. It would also be great if I could have your email address so that I ask my questions directly.
I would also appreciate any documents that explains the report file. I made a run for tuning a few classifiers, and the final report looks like what follows. I am quite confused what these mean.
Best, Mojgan
**Score
classifier aucroc 0.721 classifier aucprc 0.060 ensemble aucroc 0.721 ensemble aucprc 0.059
Report
best score single pipeline (while fitting) 0.718 model_names_single_pipeline [ Gradient Boosting ] best ensemble score (while fittng) 0.719 ensemble_pipelines ['[ Gradient Boosting ]', '[ XGBoost ]', '[ Gradient Boosting ]'] ensemble_pipelines_weight [0.2865448126747815, 0.42185017656977897, 0.2916050107554396] ... acquisition_type LCB kernel_members 0 ['Gradient Boosting'] kernel_members 1 ['Adaboost'] kernel_members 2 ['Neural Network', 'XGBoost', 'Random Forest'] ... Average performance per classifier (ignoring hyperparameters):
0 Gradient Boosting 100 0.676 0.050 1 XGBoost 31 0.671 0.049 2 Random Forest 39 0.670 0.051 3 AdaBoost 100 0.646 0.044 4 NeuralNet 30 0.500 0.022**
Hi Ahmed,
I have managed to install and run AutoPrognosis on the sample data that you have used for the toturial, but it gives me an error on my dataset. The error that I get follows. Do you have any suggestions?
I would also like to know if your UCLA email address is still valid. I sent you an email to that address around a month ago. Have you seen it?
Best, Mojgan
RRuntimeError Traceback (most recent call last)