cbrnr / sleepecg

Sleep stage detection using ECG
BSD 3-Clause "New" or "Revised" License
90 stars 23 forks source link

Use `int8` in stages array #137

Closed cbrnr closed 1 year ago

cbrnr commented 1 year ago

Maybe this is just a very minor and insignificant detail, but I noticed that prepare_data_keras() returns a float stages array after padding and one-hot encoding (even when the input argument is int8). Would it make sense to return an int8 stages array instead?

hofaflo commented 1 year ago

Interesting question! Assuming 2000 12-hour recordings with 6 stages (W/R/N1/N2/N3/undefined), this would save about 50MB of RAM - but only until the loss is calculated, when the true labels are cast to the dtype of the predicted ones (which are floats, since the network outputs probabilities). So I guess it would make sense, however rather limited :D

cbrnr commented 1 year ago

I know it's not much, but I think it would also be more consistent (the labels array before is also int8 so why should this function convert it to float).

hofaflo commented 1 year ago

I think for consistency's sake it actually makes more sense to have floats. The network output for a single sample (i.e. 30s-window) is an array containing probabilities for the possible classes. While the original labels are integers (for the datasets we currently support), the one-hot encoded labels returned by prepare_data_keras() are more similar to the network output in structure. E.g. for a dataset where each recording was labeled by multiple scorers (which don't agree perfectly), the "ground truth" would also contain "probabilities" instead of a one-hot encoding.

cbrnr commented 1 year ago

OK, this makes sense! Let's not change it then!