UBC-MDS / dsci522-group16

This is the repo for the group project for DSCI 522 (group 16)
MIT License
0 stars 7 forks source link

Discussion regarding possibility of a new project and few updates #14

Closed adibns closed 3 years ago

adibns commented 3 years ago

Hi @IfyAnene7 @Saule-Atymtayeva @rahulkuriyedath,

As per slack discussion about new source for possible new project https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra Thanks to @IfyAnene7

Question: 'Given clinical and anthropometric data, predict if a patient has breast cancer or not?' What do you think?

If anyone has domain expertise please help

From what I have read about anthropometric data: Anthropometric measurements are a series of quantitative measurements of the muscle, bone, and adipose tissue used to assess the composition of the body. The core elements of anthropometry are height, weight, body mass index (BMI), body circumferences (waist, hip, and limbs), and skinfold thickness Source: ncbi.nlm.nih.gov/books/NBK537315/

And from little of what I know Insulin, Glucose etc. are obtained from diagnostic lab test => clinical data

I have modified a previous fetch file so that it handles .csv directly instead of .zip. And it is working

I ran a basic Logistic Regression and SVC models without preprocessing to check the performance. Here are my observations LR:

  1. does not great have valid and test accuracy (~80% mean CV train accuracy, ~68% mean valid accuracy; ~71% test accuracy; little overfit I think)
  2. low recall @having breast cancer class.
  3. low precision @not having breast cancer class
  4. If preprocessing and Hyper parametrization is done. It might improve things.

SVC: Super overfit

IfyAnene7 commented 3 years ago

Yeah this is really good, I think we can start with using a baseline model like dummyclassifier and then add the remaining classifiers you've mentioned