Big-Life-Lab / bllflow

Big Life Lab Flow (BLLFlow) - a workflow for open, reproducible research. Includes support for PMML, DDI.
https://bllflow.projectbiglife.ca
Other
11 stars 1 forks source link

Model export vignette #70

Closed yulric closed 1 year ago

yulric commented 4 years ago

This vignette documents a way that external investigators can export their model using CSV files. Look at the file vignettes/model-export.Rmd.

Things to discuss:

  1. How to represent the hazards in a cox model?
  2. How to represent the time metrics in a cox model? For eg., is the model for 1 year, 5 years, etc.
StaceyFisher commented 4 years ago

1) center.csv - 'finalVariable' is the variable after centering, which may not actually be the final model variable, correct? 2) cox.csv - I assume that I can rename this file, depending on the type of model? 3) model-steps.csv - This file lists the order the files should be run, correct? So 'final' variables from the first file could be an input 'variable' in the second file- that would make sense to me 4) Should dummy variable specification be in another file? Or is it all specified in variable-details.csv? 5) variable-details.csv - do the columns that are currently empty need to be filled? If so, could you describe what you want in them? I understand the ones that are currently filled 6) variable-details.csv - what variables go in this file? Just variables that need to be dummied? I see you have age as an example, however. Just age? Or also age_c, age_rcs1, and age_rcs2? I suppose maybe it is any variable that should be 'checked' (for min, max ect)? Perhaps it would help if we could specify at what stage this file is used, for example in model-steps.csv? 7) variables.csv - what variables go in this file? all starting variables?

yulric commented 4 years ago
1. center.csv - 'finalVariable' is the variable after centering, which may not actually be the final model variable, correct?

2. cox.csv - I assume that I can rename this file, depending on the type of model?

3. model-steps.csv - This file lists the order the files should be run, correct? So 'final' variables from the first file could be an input 'variable' in the second file- that would make sense to me

4. Should dummy variable specification be in another file? Or is it all specified in variable-details.csv?

5. variable-details.csv - do the columns that are currently empty need to be filled? If so, could you describe what you want in them? I understand the ones that are currently filled

6. variable-details.csv - what variables go in this file? Just variables that need to be dummied? I see you have age as an example, however. Just age? Or also age_c, age_rcs1, and age_rcs2? I suppose maybe it is any variable that should be 'checked' (for min, max ect)? Perhaps it would help if we could specify at what stage this file is used, for example in model-steps.csv?

7. variables.csv - what variables go in this file? all starting variables?
  1. Yes. Maybe it would be better to rename it to centeredVariable?
  2. The fileName does not really matter but the step column value does. For eg., for a logistic regression model we would call the step logistic and the fileName could be model.csv. For the last step, the step column should inform us what kind of the model it is and so what formula needs to be run to evaluate the model.
  3. Yes, the transformed variables from one could be inputs to the next file. Essentially, the variables that can be referenced in a file need to declared in some file before it.
  4. I'm not sure. Do you think there's some dummying info that is not being captured in the variables-details file? The one good reason to put the dummy info there is because we are also defining the catValues there, so its easy to map a catValue to a dummy variable.
  5. Ah sorry, those columns are part of the current variables-details file in cchsflow but I don't think we need them for the model export files. I'll remove them.
  6. All the variables defined in the variables.csv file need to be defined in the variables-details file.
  7. All starting variables yes.
StaceyFisher commented 4 years ago
1. center.csv - 'finalVariable' is the variable after centering, which may not actually be the final model variable, correct?

2. cox.csv - I assume that I can rename this file, depending on the type of model?

3. model-steps.csv - This file lists the order the files should be run, correct? So 'final' variables from the first file could be an input 'variable' in the second file- that would make sense to me

4. Should dummy variable specification be in another file? Or is it all specified in variable-details.csv?

5. variable-details.csv - do the columns that are currently empty need to be filled? If so, could you describe what you want in them? I understand the ones that are currently filled

6. variable-details.csv - what variables go in this file? Just variables that need to be dummied? I see you have age as an example, however. Just age? Or also age_c, age_rcs1, and age_rcs2? I suppose maybe it is any variable that should be 'checked' (for min, max ect)? Perhaps it would help if we could specify at what stage this file is used, for example in model-steps.csv?

7. variables.csv - what variables go in this file? all starting variables?
  1. Yes. Maybe it would be better to rename it to centeredVariable?
  2. The fileName does not really matter but the step column value does. For eg., for a logistic regression model we would call the step logistic and the fileName could be model.csv. For the last step, the step column should inform us what kind of the model it is and so what formula needs to be run to evaluate the model.
  3. Yes, the transformed variables from one could be inputs to the next file. Essentially, the variables that can be referenced in a file need to declared in some file before it.
  4. I'm not sure. Do you think there's some dummying info that is not being captured in the variables-details file? The one good reason to put the dummy info there is because we are also defining the catValues there, so its easy to map a catValue to a dummy variable.
  5. Ah sorry, those columns are part of the current variables-details file in cchsflow but I don't think we need them for the model export files. I'll remove them.
  6. All the variables defined in the variables.csv file need to be defined in the variables-details file.
  7. All starting variables yes.
  1. Yes, I would rename to centeredVariable
  2. I think it is fine in the variable-details, as long as we know when dummy-ing occurs, defined in model-steps.csv

I will start to fill these in for DemPoRT. Thanks!

StaceyFisher commented 4 years ago

Baseline hazard should go somewhere too. Not sure where would make the most sense. @yulric

yulric commented 4 years ago

Baseline hazard should go somewhere too. Not sure where would make the most sense. @yulric

I think in the cox step or in your case the fine-and-grey step. Should we have a row in that file for baseline hazard? I know that some algorithms the baseline hazard changes with time?

DougManuel commented 4 years ago

I've added @Rhan43 to this PR review, since she'll need to create oversee the creation of model export for RESPECT. Also included @amytmhsu in case she want to see what's going on.

yulric commented 1 year ago

Closing PR. All documentation is in the model parameters repo.