UBC-MDS / DSCI-522-2425-team35-Heart_disease_diagnostic_machine

MIT License
1 stars 0 forks source link

Data Validation To Do #12

Open seshafi opened 6 hours ago

seshafi commented 6 hours ago

Data validation checks:

  1. [ ] Correct data file format
  2. [x] Correct column names
  3. [x] No empty observations
  4. [x] Missingness not beyond expected threshold
  5. [x] Correct data types in each column
  6. [x] No duplicate observations
  7. [ ] No outlier or anomalous values
  8. [x] Correct category levels (i.e., no string mismatches or single values)
  9. [ ] Target/response variable follows expected distribution
  10. [ ] No anomalous correlations between target/response variable and features/explanatory variables
  11. [ ] No anomalous correlations between features/explanatory variables

After completing checklist:

seshafi commented 6 hours ago

We are unclear on how to check:

Check with Daniel/Tiff tomorrow.

Other questions:

seshafi commented 5 hours ago

Invalid data issues:

  1. Exercised-induced angina: should be bool but it is currently object
  2. number of major vessels: should be int but it's currently a float
  3. Diagnosis of heart disease: should be limited to 0 and 1