carissalow / rapids

Reproducible Analysis Pipeline for Data Streams
http://www.rapids.science/
GNU Affero General Public License v3.0
37 stars 20 forks source link

Fix negative battery consumption rate bug #222

Closed jenniferfedor closed 1 year ago

jenniferfedor commented 1 year ago

This PR addresses an issue that resulted in negative values for battery consumption rate features in the phone battery RAPIDS provider.

For each battery episode, the battery difference is calculated as the battery level at the end of the episode subtracted from the battery level at the beginning of the episode. Battery consumption rate is then calculated as the ratio between the episode’s battery difference and the duration of the episode.

Although we calculate a battery consumption rate for all episodes regardless of type, only discharge episodes contribute to computation of the avgbatteryconsumptionrate and maxbatteryconsumptionrate features. We define discharge episodes as episodes with a battery status of 3 or 4. During a discharge episode, we would expect the battery level at the beginning of the episode to always be greater than or equal to the battery level at the end of the episode (i.e., the battery does not accumulate charge if it is discharging, but may not necessarily lose charge), so we would expect these features to always have a value $\ge$ 0.

Presently we assign rows of battery data to episodes by incrementing an ID column by 1 when either of the following conditions are met:

  1. The current row’s battery status is not equal to the previous row’s battery status, OR
  2. The time difference between the current row’s start timestamp and the previous row’s end timestamp is >1 ms

However, there are sometimes cases when the current row's battery status is equal to the previous row’s battery status and that status is 3 or 4 (reflecting discharging), but the current row's battery level is greater than the previous row’s battery status (i.e., the battery level increased but there was no corresponding change in battery status to reflect this). Presently such rows would be assigned to the same discharge episode (assuming the time difference criterion is also met), and the resulting battery difference and consumption rate for that discharge episode will be <0.

To account for this scenario, we include an additional condition when assigning rows of battery data to episodes:

  1. The current row’s battery status is not equal to the previous row’s battery status, OR
  2. The time difference between the current row’s start timestamp and the previous row’s end timestamp is >1 ms, OR
  3. The current row’s battery status is 3 or 4 AND the current row’s battery level is greater than the previous row’s battery level