Open JosephBizu opened 1 year ago
Read: SMOTE Ensemble Modelling Signal Status Recognition Based on 1DCNN and Its Leakage Detection Classification of caesarean section and normal vaginal deliveries using foetal heart rate signals and advanced machine learning algorithms
SMOTE summary: The paper focuses on the skewed data problem, where there are significantly more normal cases than "abnormal" cases in the data. The paper tries to prove that over-sampling the minority whilst under-sampling the majority, will produce better results (using ROC curve as measurement tool) rather than only under-sampling. The method of over-sampling involves creating synthetic minority class examples. Under-sampling the majority class enables better classifiers than over-sampling the minority class. Combination of the two does not lead to classifiers that out perform those built utilizing only under-sampling. Over-sampling the minority with synthetic data (using SMOTE algo to generate synthetic data) achieves better result than over-sampling with replacement.
Ensemble Modelling Summary: Linear features can be broadly defined as those features that are visible through human eyes, such as accelerations and decelerations. Non linear features are much more difficult to interpret, such as quantifying the difference between two or more observations. FIGO has determined some linear features for interpretation. Using these linear features alone create high variance in interpretation and couldn’t be relied on enough.
Non linear features: Root mean squares, sample entropy. RMS measures the magnitude of the varying quantity and is an effective signal strength indicator in heart rate variability studies. Sample entropy represents the non linear dynamics and loss of complexity in the FHR, and is a useful indicator for foetal hypoxia and metabolic-acidosis detection.
Frequency representations via FFT or Power Spectral Density can minimize signal quantity variations. Frequency representations via FFT etc, are direct measure of morphological properties of a signal, and have proven to be an effective indicator for foetal hypoxia and metabolic acidosis detection (non-linear shit).
The feature vectors in this study include RBL, Accelerations, Decelerations, STV, LTV, SampEn, FD, DFA, Fpeak, RMS, SD1, SD2, SDRatio. The non-linear features have much better discriminative capacity when classifying normal and pathological records. Describing the capabilities for each feature is determined in this study using a Recursive Feature Elimination algorithm (RFE).The eight ranked features are DFA, RMS, FD, SD1, SDRatio, SD2, SampEn and STV.
For data skew problem, down and up sampling techniques were used. The down sampling was by 100% and up sampling was by 600%.
There were three machine learning models that were trained on the data: RF, SVM and FDLA. All the models were trained on the 13 features and 8 features (using only 8 features gave better performances.) Each model didn’t produce satisfying results, but there ensemble did. A correlation between each model was tested, and it was found compatible to ensemble all three together.
Signal Status Recognition Based on 1DCNN and Its summary: The paper explains the use of 1dcnn over signal processing. Stating that each kernel filter is a feature extraction that resembles a different signal extraction technique. The paper checks the correlation between each filter, and correlation between the filters and between the input signal. As the training iterations increase, the correlation between the kernels and the input signal increase (up to 0.4).
Deep neural network-based classification of cardiotocograms outperformed conventional algorithms summary: The paper discusses a solution for assessing the type of birth using CTG data. The solution interprets data of the last 30 minutes. A CNN net model with 3 convolutional layers. The net had better results than LSTM model and classic machine learning models. The net learns both UC and FHR data, and looks only over the last 30 minutes of the signal before birth. The model had better results over raw data rather then smoothed data. Using depthwise separable convolution layer.
Noise cleaning: A three class treatment of the FHR classification problem using latent class analysis labeling
The approach suggested in the paper is as follows: Fetal heart rate, recorded either by ultrasound Doppler probe or by a scalp electrode, contains a lot of artifacts. Therefore it is necessary to preprocess the FHR signal before applying any feature extraction method. The values outside interval 50-220 beats per minute (bpm) were considered as artifacts and treated as missing data. Then, missing data were interpolated using a Matlab implementation of Hermite spline interpolation. After that a number of features was extracted in order to condense the most relevant information.
CTG preprocessing: Signals between 50-200 are fine, other values should be interpolated using Hermite spline. Missing values that are longer than 15 seconds should be removed.
Using wfdb as a python package to read the waveforms, data preprocess is required. Following steps should be taken: