Choice of example dataset -- not censored?

gm-spacagna / deep-ttf

Survival analsyis and time-to-failure predictive modeling using Weibull distributions and Recurrent Neural Networks in Keras

234 stars 95 forks source link

Choice of example dataset -- not censored? #1

Open ibarrien opened 6 years ago

ibarrien commented 6 years ago

Hello,

Very nice work. It doesn't appear that the engine data set you've collected is apt survival analysis given that the target event (engine failure) is recored. E.g. "(engine/day, 2) tensor containing time-to-event and 1 (since all engines failed)" E.g. it seems like you're attempting semi-supervised learning on a fully-supervised dataset (unless I've missed something!) It would be nice to see a simpler and censored example, perhaps the rossi dataset that Lifelines experiments with: from lifelines.datasets import load_rossi rossi_dataset = load_rossi()

See: http://lifelines.readthedocs.io/en/latest/Survival%20Regression.html

gm-spacagna commented 6 years ago

You are correct, we simulate the censoring just for sake of simplicity. The use case I have developed at my work was using real censored data for which we observe a sensor telemetry but we don't know if it stopped collecting data because the component failure happened or simply because the ICT system was not collecting data anymore or the sensor was removed from the component due to some testing reasons.

gm-spacagna commented 6 years ago

In lifelines examples they do not use time-series or sequential data. We can find a real time-series dataset which is censored and build another example for the tutorials. Maybe a IoT related use case. Any suggestions?

archenroot commented 5 years ago

@gm-spacagna - yeah really nice work.

So to understand, censored data are when you do not know moment of failure from historical data, correct? But on the other hand it could be also more accurate than on censored data. I would like to adopt this for data from IoT devices on manufacturing where in sample data there are always marked ERRORs.