Apress / ai-for-healthcare-keras-tensorflow-2.0

Source Code for 'AI for Healthcare with Keras and Tensorflow 2.0' by Anshik Bansal
Other
15 stars 11 forks source link

MultiLabelBinarizer classes issue #1

Open jplasser opened 2 years ago

jplasser commented 2 years ago

In the notebook of chapter 4 there is a mistake in the # Binarizing the multi-labels section.

The full_data.ICD9_CODE is converted to a list, but should also be wrapped into a list. This is one of the most common mistakes of MultiLabelBinarizer.

Correct code: mlb_fit = mlb.fit([full_data.ICD9_CODE.tolist()])

The difference in the resulting classes is as follows:

  1. original code: mlbfit.classes

    array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'V', '|'], dtype=object)

  2. my corrected code: mlbfit.classes

    array(['25000', '25000|2720', '25000|2720|4019', ..., 'V290|53081|V053', 'V290|V053', 'V290|V053|53081'], dtype=object)

omshri29 commented 2 years ago

Class should be like this. mlbfit.classes array(['038.9', '244.9', '250.00', '272.0', '272.4', '276.1', '276.2', '285.1', '285.9', '287.5', '305.1', '311', '33.24', '36.15', '37.22', '37.23', '38.91', '38.93', '39.61', '39.95', '401.9', '403.90', '410.71', '412', '414.01', '424.0', '427.31', '428.0', '45.13', '486', '496', '507.0', '511.9', '518.81', '530.81', '584.9', '585.9', '599.0', '88.56', '88.72', '96.04', '96.6', '96.71', '96.72', '99.04', '99.15', '995.92', 'V15.82', 'V45.81', 'V58.61'], dtype=object)

Also, the data generator is no good. Model.predict will get into an infinity loop.