Question: 'Given clinical and anthropometric data, predict if a patient has breast cancer or not?'
What do you think?
If anyone has domain expertise please help
From what I have read about anthropometric data:
Anthropometric measurements are a series of quantitative measurements of the muscle, bone, and adipose tissue used to assess the composition of the body. The core elements of anthropometry are height, weight, body mass index (BMI), body circumferences (waist, hip, and limbs), and skinfold thickness
Source: ncbi.nlm.nih.gov/books/NBK537315/
And from little of what I know Insulin, Glucose etc. are obtained from diagnostic lab test => clinical data
I have modified a previous fetch file so that it handles .csv directly instead of .zip. And it is working
I ran a basic Logistic Regression and SVC models without preprocessing to check the performance. Here are my observations
LR:
does not great have valid and test accuracy (~80% mean CV train accuracy, ~68% mean valid accuracy; ~71% test accuracy; little overfit I think)
low recall @having breast cancer class.
low precision @not having breast cancer class
If preprocessing and Hyper parametrization is done. It might improve things.
Yeah this is really good, I think we can start with using a baseline model like dummyclassifier and then add the remaining classifiers you've mentioned
Hi @IfyAnene7 @Saule-Atymtayeva @rahulkuriyedath,
As per slack discussion about new source for possible new project https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra Thanks to @IfyAnene7
Question: 'Given clinical and anthropometric data, predict if a patient has breast cancer or not?' What do you think?
If anyone has domain expertise please help
From what I have read about anthropometric data: Anthropometric measurements are a series of quantitative measurements of the muscle, bone, and adipose tissue used to assess the composition of the body. The core elements of anthropometry are height, weight, body mass index (BMI), body circumferences (waist, hip, and limbs), and skinfold thickness Source: ncbi.nlm.nih.gov/books/NBK537315/
And from little of what I know Insulin, Glucose etc. are obtained from diagnostic lab test => clinical data
I have modified a previous fetch file so that it handles .csv directly instead of .zip. And it is working
I ran a basic Logistic Regression and SVC models without preprocessing to check the performance. Here are my observations LR:
SVC: Super overfit