Open philerooski opened 5 years ago
We discussed this on the call, in addition to the stretching and scaling done above, we also decided on rotating the data after this step, using rgl::rotate3d. The angle needs to be a random sample from -pi to pi, and the unit vector needs to be random. Like generate a sequence c(random(-1,1), random(-1,1), random(-1,1)), where random(-1,1) is a random value from -1 to 1. Then normalize the vector to be of unit norm.
We'll include a random rotation as well, like:
random_unit_vector <- function() {
ru <- list()
theta <- runif(1, 0, 2*pi)
ru$z <- runif(1, -1, 1)
ru$y <- sqrt(1 - ru$z ^ 2) * sin(theta)
ru$x <- sqrt(1 - ru$z ^ 2) * cos(theta)
return(ru)
}
add_noise_to_data <- function(sensor_data, stretch_factor = 0.5, noise_factor = 0.1) {
sensor_data <- as_tibble(sensor_data) # for purrr::map / keep colnames
noisy_sensor_data <- purrr::map(sensor_data, function(d) {
sd_d <- sd(d)
d * (sample(c(-1, 1), 1) + (sample(c(-1, 1), 1) * sd_d * stretch_factor)) +
rnorm(length(d), sd = noise_factor * sd_d)
})
noisy_sensor_data <- simplify2array(noisy_sensor_data)
ru <- random_unit_vector()
theta <- runif(1, -pi, pi)
rotated_noisy_sensor_data <- rgl::rotate3d(
noisy_sensor_data, angle = theta, x = ru$x, y = ru$y, z = ru$z)
return(rotated_noisy_sensor_data)
}
This came up as I was working on #37. It may be useful to include a function for generating "new" samples by adding noise to existing samples, then running the newly generated sample through the feature extraction pipeline. This step should happen just after a detrend (or perhaps a bandpass?) in the usual accel/gyro feature extraction pipeline.
Why generate new, fake samples? It's often the case that researchers are working with small amounts of data (from clinical studies) or there is a huge class imbalance between certain demographics (old/young, parkinsons/contol, ...) as we saw in mPower. To achieve better model accuracy, it's useful to augment the data by generating new samples. If we simply duplicate the data, it's oversampling. If we add noise, rotations, and/or modify the magnitude of the signal, it's data augmentation.
For example, here's the x axis of some accelerometer data:
And here is the same signal with some noise added and magnitude modifications:
I'm using this code to transform the signal (sensor_data has columns x, y, z):