FixML / test-checklist-for-machine-learning

Other
0 stars 0 forks source link

Threat model for machine learning projects #3

Open ttimbers opened 1 month ago

ttimbers commented 1 month ago

Here I want to brainstorm a list to what are all the potential threats (i.e., where can things go wrong) to a machine learning project? Our checklist need not address all of them, but we should in our literature review describe them all, and identify which our checklist covers. Here's my starting list:

ttimbers commented 1 month ago
H234J commented 1 month ago

Few more important potential threats can be as follows:

  1. Poor hyperparameter tuning (selecting wrong learning rate, too small or too high gamma value in SVM, depth of tree in decision trees or number of n-estimators in Random Forest)

  2. Skewed Classes in training set ( This can lead too much training for majority class and less training on minority class)

  3. Model Scalability ( Increase in prediction latency when then inflow velocity is high )

ttimbers commented 1 month ago

Thank-you for these suggestions @H234J!

JohnShiuMK commented 1 month ago

I am not sure if this is part of the reproducibility issue, or should be separated:

Previously, I encountered situations like a project failed to run, or, the model produced different outputs following upgrades of underlying dependencies.

JohnShiuMK commented 1 month ago

another potential mistake:

tonyshumlh commented 3 weeks ago
  1. Extension of "Mismatch of machine learning model choice with respect to the data used for training and evaluation": improper evaluation metrics, e.g. use accuracy for very imbalanced dataset, or scenario where false positives/negatives have serious consequence.
  2. Model interpretation issue, similar to Model behaviour/learning issues: model prediction puts too much weight on attributes where human expects little or no effect, or vice versa