data.csv missing - Githubissues

aqibsaeed / Multilabel-timeseries-classification-with-LSTM

Tensorflow implementation of paper: Learning to Diagnose with LSTM Recurrent Neural Networks.

Apache License 2.0

575 stars 187 forks source link

data.csv missing #2

Closed pannous closed 7 years ago

aqibsaeed commented 7 years ago

Electronic health records data of patients are not publicly available because of sensitive information. For the data set format/description please see the paper.

pannous commented 7 years ago

some example data would be really helpful. otherwise all of the thousand users who are currently trying your code will run against a wall and have to somehow make up their data themselves. (100stars~1000users)

aqibsaeed commented 7 years ago

I am sorry for this, I completely understand your point. I always add data sets in my repository when they are publicly available. I put the code so people who have similar data set (as discussed in the paper) available can try out the model.

Here is similar dataset named MIMIC-III. But as mentioned in the notebook pre-processing is required according to the use case.

aqibsaeed commented 7 years ago

Some more information on data set format. I hope, it will help and clear things up. The data set will look something like the figure below.

TEMP, PH etc. are all features, each red point represents feature value at one time point. Diagnosis vector (one-hot encoded) not shown in the picture will be class labels. Each training example will be a sequence of shape [1, time_steps, number_of_features] so your batch's shape will be [batch_size, time_steps, number_of_features].

Please see section on Dataset Description in paper for more detail.