Sage-Bionetworks / mhealthtools

A modular R package for extracting features from mobile sensor data
Apache License 2.0
13 stars 10 forks source link

Data augmentation feature request #133

Open philerooski opened 5 years ago

philerooski commented 5 years ago

This came up as I was working on #37. It may be useful to include a function for generating "new" samples by adding noise to existing samples, then running the newly generated sample through the feature extraction pipeline. This step should happen just after a detrend (or perhaps a bandpass?) in the usual accel/gyro feature extraction pipeline.

Why generate new, fake samples? It's often the case that researchers are working with small amounts of data (from clinical studies) or there is a huge class imbalance between certain demographics (old/young, parkinsons/contol, ...) as we saw in mPower. To achieve better model accuracy, it's useful to augment the data by generating new samples. If we simply duplicate the data, it's oversampling. If we add noise, rotations, and/or modify the magnitude of the signal, it's data augmentation.

For example, here's the x axis of some accelerometer data: image

And here is the same signal with some noise added and magnitude modifications: image

I'm using this code to transform the signal (sensor_data has columns x, y, z):

add_noise_to_data <- function(sensor_data, stretch_factor = 0.5, noise_factor = 0.1) {
  noisy_sensor_data <- purrr::map(sensor_data, function(s) {
    sd_s <- sd(s)
    s * (sample(c(-1, 1), 1) + (sample(c(-1, 1), 1) * sd_s * stretch_factor)) +
      rnorm(length(s), sd = noise_factor * sd_s)
  })
  return(noisy_sensor_data)
}
itismeghasyam commented 5 years ago

We discussed this on the call, in addition to the stretching and scaling done above, we also decided on rotating the data after this step, using rgl::rotate3d. The angle needs to be a random sample from -pi to pi, and the unit vector needs to be random. Like generate a sequence c(random(-1,1), random(-1,1), random(-1,1)), where random(-1,1) is a random value from -1 to 1. Then normalize the vector to be of unit norm.

philerooski commented 5 years ago

We'll include a random rotation as well, like:

random_unit_vector <- function() {
  ru <- list()
  theta <- runif(1, 0, 2*pi)
  ru$z <- runif(1, -1, 1)
  ru$y <- sqrt(1 - ru$z ^ 2) * sin(theta)
  ru$x <- sqrt(1 - ru$z ^ 2) * cos(theta)
  return(ru)
}

add_noise_to_data <- function(sensor_data, stretch_factor = 0.5, noise_factor = 0.1) {
  sensor_data <- as_tibble(sensor_data) # for purrr::map / keep colnames
  noisy_sensor_data <- purrr::map(sensor_data, function(d) {
    sd_d <- sd(d)
    d * (sample(c(-1, 1), 1) + (sample(c(-1, 1), 1) * sd_d * stretch_factor)) +
      rnorm(length(d), sd = noise_factor * sd_d)
  })
  noisy_sensor_data <- simplify2array(noisy_sensor_data)
  ru <- random_unit_vector()
  theta <- runif(1, -pi, pi)
  rotated_noisy_sensor_data <- rgl::rotate3d(
    noisy_sensor_data, angle = theta, x = ru$x, y = ru$y, z = ru$z)
  return(rotated_noisy_sensor_data)
}