Open allenmichael099 opened 2 years ago
Computing aggregate energy consumption means and variances on the fly: The user on the app or public dashboard selects a set of trips (T) somehow - date range, maybe program on the dashboard.
mean_EC_agg = 0 # aggregate energy consumption
var_EC_agg = 0
for t in T:
# Get trip length error
check OS, configuration for trip t
fetch the unit length (u) mean and variance for OS, configuration
mean_L = t.L_sensed * mean_u
var_L = t.L_sensed^2 * var_u
# Get mode error
check mode inference model (model) used # we could check the "algorithm_chosen" field for the trip
if model is sensing:
fetch the EI mean and variance corresponding to t.predicted mode, OS,configuration
if model is label assist:
fetch the predicted mode column from the corresponding user specific label assist confusion matrix
if model is "user labeled":
mean_EI = EI(user labeled mode)
var_EI = 0
# Compute trip level EC mean and variance
mean_EC = mean_L*mean_EI
var_EC = var_EI*mean_L^2 + var_L*mean_EI^2
# Add to total
mean_EC_agg += mean_EC
var_EC_agg += var_EC
The above can probably be vectorized. Then we can present the mean value (mean_EC_agg) +- sqrt(var_EC_agg) as a dot on a number line with error bars.
Early steps:
Evaluation: Sensitivity Analysis: Consider only labeled trips.
Monte Carlo Simulation with mock trip lengths and modes: (I think) this will show what the distribution of aggregate energy consumption could look like if the following assumptions hold:
When I ran a simulation with 3 modes and trip lengths uniformly sampled from 3-50 units, aggregate energy consumption appears normally distributed. I'll need to explain the simulation carefully to make sure it checks what I want to check. I can make the simulation closer to what we are doing by sampling from the correct distance relative error distribution. For distance, now I just use a normal distribution with the same mean and variance as one of the OS,configuration pairs. For mode, I use multinoulli distributions.
Also: make unit tests
Current tasks:
Upcoming tasks:
For the sensitivity analysis, I will start with sensing first. With label assist, I would need to retrain on smaller percentages of the data. However, we can look at the current proportion of label assist.
Question: at what threshold of confidence should we switch to label assist over sensing? Use current threshold of 0.25 for now.
~ 50k labeled trips: all 50k labeled 40k labeled, 10k sensing 30k labeled, 20k sensing ... all sensing
Right now we are using manually constructed confusion matrices from Gabe (kennykos)'s paper. Upon doing the sensitivity analysis, we found that sensing overestimated aggregate energy consumption for vail by about 2000. We thought this might be because sensing cannot distinguish between drove alone and shared ride. So, I created a "sensed_car" category and assigned it an energy intensity of 1/1.5 EI(drove_alone). This led us to underestimate* energy consumption, even though the distance share of shared rides is 1.180941e+06 meters higher than drove alone's distance share for vail.
Below, "sensing-confusion" refers to the use of the confusion matrix to get an expected energy intensity (mean_EI). "sensing-prediction/prediction/naive approach" refer to the use of the sensing model predicted EI as is.
I also took a look at the set of trips without shared rides. We still underestimate with a sensing-confusion approach, while the sensing-prediction No shared rides and load factor of 1: EC from: sensing with confusion matrix based EI, sensing with predicted EI, user labels: (6191.57, 7592.53, 7140.17)
EC with shared rides and a load factor of 1.5: Naïve (predictions only): 9046.67 User labeled: 10846.41 Sensed with confusion: 9028.78
Here is the vail sensitivity analysis when we use a load factor of 1.5 and include shared rides.
[Yesterday 12:11 PM] Shankari, K. removing shared rides, user label -> confusion matrix drops; but user label -> predicted label increases
so I'm wondering if the reason is that the majority of the EC is due to drove alone, and every time we use the CM, every car trip's EC goes down because of the other probabilties
but you would expect that sometimes you would get it wrong the other way (e.g. predict bike for car trip) so the other probabilities would have a higher energy intensity
if the CM rows are the ones in your example
and I have 5 car trips and 2 bike trips ground truth
if we count the 5 car trips as car, the EC will be 1.512 \times 5 + 0 \times 2
but if we use the CM, the car EC will become 0.7 \times 1.512 \times 5 + 0.022 \times 0.2 \times 5
which is lower than the original
but with the CM, the bike EC will become 0.1 \times 1.512 \times 2 + 0.022 \times 0.4 \times 2
which is higher than the original
[Yesterday 12:27 PM] Shankari, K. For trips that you are counting as "sensed": choose the label for each predicted section For trips that you are counting as "user label": set the user label as the mode for only the primary sectionassign the others as walk/bike
[Yesterday 12:28 PM] Shankari, K. so you can: apply the user label to only the primary section and compute EC at the section levelapply the primary sensed mode to the full trip and compute the EC at the trip level
[Yesterday 12:32 PM] Shankari, K. you can compute the decrease due to car trips and the increase due to everything else (including the distance in each) and see if that matches up with our explanation above
Updates from meeting: Define primary section as the longest distance section. Define the primary mode as the mode of the primary section. Tied distances are broken by the mode with highest energy intensity. Looking only at primary section and computing EC: Prediction: 10582 User labels: 9483 Confusion based: 7194
Looking at the primary section as the mode for the whole trip and computing EC: Prediction: 12273 User labels: 10841 Confusion based: 8260
So prediction is still closer to user labels than confusion based. Another thing of note is that P(userlabel = car| predict car) is about 0.83, not the 0.45 in MobilityNet.
[3:22 PM] Shankari, K. assuming that the difference because of using the CM is due to car trips, it seems like there can be two separate reasons for why we are off: car precision is < 40% (40% of the time when we guess car, it is a car)car is so much more energy intensive than the other modes (car is 1.512 and e-bike is 0.022)
[3:27 PM] Shankari, K. things to experiment with: FIRST: Try to change the car precision in increments of 10% and see how the plots change
[3:28 PM] Shankari, K. then see if we can influence the precision in a way that makes sense: Look at Figure 4 in Gabe's paper: "AUTOMOTIVE" prediction is actually pretty good, over 90%. Is there a way we ca start with a much smaller confusion matrix - walk/bike/automotive. we could use "automotive" EI based on distribution of motorized modes in the NHTS for examplewe could also experiment with combining confusion matrices from the program (Vail) and MobilityNet
[3:30 PM] Shankari, K. if the US as a whole has 60% car, 20% bus, 20% train
Results from adjusting car precision: If we say car has a load factor of 1.5, the car EI is actually less than the bus EI. So when we increase car precision while accounting for shared rides, the expected EI decreases! This would lead to an even lower expected energy consumption (confusion based energy consumption is expected energy consumption here).
If:
Then the (floored) ECs are: car precision: 0.55 prediction,confusion based: (6071, 5859) car precision: 0.65 prediction,confusion based: (6071, 5939) car precision: 0.75 prediction,confusion based: (6071, 6018) car precision: 0.85 prediction,confusion based: (6071, 6097) User EC: 6391
A similar pattern holds if we apply the primary mode to the full trip - expected energy consumption does not pass predicted energy consumption until car precision is between 0.75 and 0.85. User label based energy consumption stays higher than both.
walk | mm | e-mm | bus | car | train | EC |
---|---|---|---|---|---|---|
0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 0.5 | 23232 |
0.8 | 0.7 | 0.6 | 0.5 | 0.8 | 0.7 | 23232 |
for walk_precision in range(0.5, 1, 0.1):
for mm_precision in range(0.5, 1, 0.1):
for e_mm_precision in range(0.5, 1, 0.1):
....
Predicted is doing so much better because the "ground truth" sample is skewed and most trips are car, so even if we just predicted car every time, we would probably do pretty well. So another experiment would be to actually construct a balanced sample through sampling the same number of trips from each mode.
The sensed should be able to work for a study in which nobody labels any trips.
If we say car has a load factor of 1.5, the car EI is actually less than the bus EI. So when we increase car precision while accounting for shared rides, the expected EI decreases! This would lead to an even lower expected energy consumption (confusion based energy consumption is expected energy consumption here).
So if we set the load factor to 1.5, and increase car precision, the expected energy is becomes lower than with lower precision. But why is it then lower than user label? Because user label also has a mix of shared ride (at LF = 2) and drove alone (at LF=1)
Another validation thing to do rather than black box sensitivity analysis is to just compare the trips one by one and compare the user label EC and the predicted EC and the predicted w/ CM EC and see which are the 10 biggest or plot their distribution by mode
Left: expected EC - user labeled EC. Right: predicted EC - user labeled EC x axis label is the user label
Based on the current data:
shared_ride
?drove_alone
?should think about how we can automate this instead of having Michael look at it manually
confusion based error 90th and 5th percentiles: (1.80, -5.39)
Summary of confusion based error: mean -0.907056 std 10.649994 min -405.592424 25% -1.436356 50% -0.446962 75% 0.040015 max 95.792431
Counts by mode where errors are below the 5th percentile: drove_alone 74 bus 15 shared_ride 5 free_shuttle 4 car 2 taxi 2
If you remove trips below the 5th percentile of confusion based error (in the negative direction): expected 7198.36 predicted 7267.31 user_labeled 7323.29
If we drop the trips below the 0.1 percentile, the total energy consumptions are: expected 8651.04 predicted 8764.61 user_labeled 9943.78
Percent error: -13%
If you do not drop trips: expected, 8991.82 predicted, 9005.56 user labeled, 10838.58
Percent error (expected - user labeled)/user labeled: -17%
Trips where the error is below the 0.1 percentile: mode_confirm | distance | error_for_confusion drove_alone | 1.445275e+05 | -59.959334 drove_alone | 2.136239e+05 | -88.479020 shared_ride | 1.039817e+06 | -405.592424
what was the sensed mode for those trips?
car,car, and no sensed for the primary sensed mode. There were also sensed walking sections.
Dropping those trips and looking only at the shared ride and drove alone error histograms, we get:
wrt https://github.com/e-mission/e-mission-docs/issues/798#issuecomment-1290890424
car,car, and no sensed for the primary sensed mode. There were also sensed walking sections.
It looks like the sensed mode was actually car. So we actually got the mode right. So is the difference primarily due to the confusion with bus or due to the walking sections?
Also, is it possible to identify the +ve and -ve contribution by each mode? I see the histogram, but adding up the values is hard to do visually, especially since the histograms don't have too much detail.
something like:
shared_ride
: expected (xxx), user label (yyy), net (zzz)
Here are the contributions of shared ride and drove alone, after dropping the three outliers below the 0.1 percentile:
shared_ride: | Name | Value |
---|---|---|
expected | 3735.418826 | |
predicted | 3976.395993 | |
user_labeled | 3175.102523 | |
error_for_prediction | 801.293470 | |
error_for_confusion | 560.316303 |
drove_alone: | Name | Value |
---|---|---|
expected: | 3530.997714 | |
predicted: | 3409.038991 | |
user_labeled: | 5719.979116 | |
error_for_prediction: | -2310.940124 | |
error_for_confusion: | -2188.981402 |
user_labled = 5719 + 3175 = 8894 predicted = 3976.395993 + 3409.038991 = 7385.434984 (-1508)
Why is drove_alone
so different and why is it always lower?
If we look at trips with both primary mode car and user label drove alone, and count every section as car, the EC estimate would increase by 48.86 kWH. The distance increase in sensed car is 49.32 miles.
Doing some math, I think we will always be too low when we have a load factor of 1.5. 4/3 = 2/1.5
(perfect prediction drove alone EC) - (user labeled drove alone EC) + (perfect prediction shared ride EC) - (user labeled shared ride EC) 5719.979116/1.5- 5719.979116 + 3175.102523*4/3 - 3175.102523 = -848.29
So if we labeled all drove alone and shared rides as car, and ignored confusion, we'd still be off by 848.
Doing some math, I think we will always be too low when we have a load factor of 1.5.
But I'm not sure I follow the calculation. Can you expand a bit? Concretely, where do you get the 4/3 from and why does it matter? And are you arguing that the difference is because the load factor is actually not 1.5?
If we look at trips that users labeled as shared ride, assume that we sensed car, and ignore confusion:
We want to replace the load factor of 2 for the user labeled shared ride trips with a load factor of 1.5, so we multiply by 2/1.5 = 4/3.
4/3 represents a conversion factor from the shared ride user labeled value we got to the value that we would calculate if we predicted those trips to be shared ride and used a load factor of 1.5.
My argument is that applying the average load factor of 1.5 to both drove alone and shared ride will be systematically off.
Let's think about PMT (Person Miles Traveled) versus VMT for a bit:
double check: overall VMT = 2000, load factor = 1.5, overall PMT = 2000 * 1.5 = 3000, which is correct
x = 1 # drove_alone_load_factor
y = 2 # shared_ride_load_factor
# VMT
v_miles_in_drove_alone = 10000 # drove alone mean distance in miles.
v_miles_in_shared_ride = 10000 # shared ride mean distance in miles. # 100 -> off by 20, 1000 -> off by 200
avg = (x+y)/2
drove_alone_EI = 1.51517707
shared_ride_EI = 0.757588535
# without average means that we use the correct energy intensity of each mode.
# For "with average", we use the average value for both.
no_average = 1/x*drove_alone_EI*v_miles_in_drove_alone + 1/y*drove_alone_EI*v_miles_in_shared_ride
with_average = 1/avg*drove_alone_EI*v_miles_in_drove_alone + 1/avg*drove_alone_EI*v_miles_in_shared_ride
print(f"drove_alone without average, with average: {1/x*drove_alone_EI*v_miles_in_drove_alone,1/avg*drove_alone_EI*v_miles_in_drove_alone}") # with avg underestimates
print(f"shared_ride without average, with average: {1/y*drove_alone_EI*v_miles_in_shared_ride,1/avg*drove_alone_EI*v_miles_in_shared_ride}") # with avg overestimates
print(no_average,with_average)
no_average = vda + 0.5 vsr
average = (vda + vsr) / 1.5
if vda = vsr
no_average = 1.5 vsr
average = 4/3 vsr
Let s = distance in shared ride, a = distance in drove alone, r = a/s = the ratio of drove alone distance to shared ride distance in our dataset. E = drove alone EI Let z be the load factor.
The energy consumption for drove alone and shared ride trips is actually: aE+ s(E_v/2) = E_v(rs + s/2)
We want to find the value of z such that applying it to both shared ride and car trips is the same as if we had used the drove alone EI for all drove alone and the shared ride EI for all shared ride: $E(rs + \frac{s}{2}) = E_v(\frac{1}{z} rs + \frac{1}{z} s)$ $rs + \frac{s}{2} = (\frac{1}{z}) rs + (\frac{1}{z}) s$
$z = 2\frac{rs + s}{2rs + s} = \frac{2r+2}{2r+1} = \frac{r + 1}{r + 1/2}$
In vail, r is about 0.91, which gives z = 1.3546.
When we use z = 1.3546 and ignore the outliers below the 0.1 percentile, here are the errors for drove alone and shared ride:
Drove alone: | Name | Value |
---|---|---|
expected | 3682.806188 | |
predicted | 3774.930611 | |
user_labeled | 5719.979116 | |
error_for_prediction | -1945.048505 | |
error_for_confusion | -2037.172927 |
Shared ride: | Name | Value |
---|---|---|
expected | 3893.278983 | |
predicted | 4402.827483 | |
user_labeled | 3175.102523 | |
error_for_prediction | 1227.724960 | |
error_for_confusion | 718.176460 |
Sum of drove alone and shared ride: | Name | Value |
---|---|---|
expected | 7576.085172 | |
predicted | 8177.758094 | |
user_labeled | 8895.081638 | |
error_for_prediction | -717.323545 | |
error_for_confusion | -1318.996467 |
The car prediction is still off, but the error decreased by about 800. Earlier the car prediction error was 801.293470 -2310.940124 = -1509.6.
If we look at all trips excluding the below 0.1 percentile outliers, the predicted is within 1 standard deviation of the user labeled value. Increased precision will shrink the standard deviation, but we'll see by how much. expected: 9018.06 predicted: 9702.64 user_labeled: 9943.78
Differences: predicted - expected: 684.587 user_labeled - predicted: 241.135
sd: 285.58
Our first pass at estimating load factor averaged the load factors, weighted by distance in each car submode: (Load factor for drove alone: 1, for shared ride: 2) $\frac{(1)a + (2)s}{a + s}$
But it seems that we should have been averaging the reciprocals of the load factors: z = $\frac{r+1}{r+1/2}$ = $\frac{a/s +1}{a/s+1/2}$ = $\frac{a+s}{a+s/2}$
1/z = $\frac{(1/1)a+(1/2)s}{a+s}$
I think that the main reason that this is not working is that we are using the wrong EI for the version with the load factor. Basically, let's say that $EI_v$ is the energy intensity for the vehicle and $EI_p$ is the energy intensity for a person. So for drove alone $EI_p = EI_v$ and for shared ride $EI_p = 0.5 EI_v \implies EI_v = 2EI_p$
Let's assume that the distance in shared ride and drove alone is equal just to reduce the number of variables and is $d$
If we have user labels, we will get
$EI = EI_v d + EI_p d = 2 EI_p d + EI_p d = 3 EI_p d$
With user label, our $VMT=2d$ and our $PMT=d + 2d = 3d$ (three people distance of $d$ each)
If we don't have user labels, we have VMT, but we don't have PMT, and we use the load factor to convert from VMT to PMT So without user labels, our $VMT = 2d$ and our $PMT = \frac{3}{2} \times 2d = 3d$
So with the load factor calculation, we get the correct PMT distance in both cases. So if there is any difference in the overall EI, it must be because of a difference in the EI for the mode that we are using.
Since we are multiplying it with the PMT, we should use the $EI_p$ and not the $EI_v$
without user labels,
$EI = 3d \times EI_p = 3 EI_p d$ which is the same as the EI for the user label method of calculation.
This is the original attempt, which doesn't work, so putting the load factor into the EI instead of the distance doesn't work.
$EI = EI_v d + EI_p d = EI_v d + 0.5 EI_p d = 1.5 EI_v d$
without user labels, $EI_{vlf} = EI_v / 1.5 = \frac{EI_v}{\frac{3}{2}} = \frac{2}{3} \times EI_v$
$EI = VMT \times EI{vlf} = 2d \times \frac{EI{vlf}}{1.5} = \frac{4}{3}d \times EI_v$
Shankari's definition of load factor: factor by which you multiply VMT to get PMT (if $d_a = d_s = d$, then lf = 1.5) Michael's definition of load factor: factor by which you divide $EI_v$ to get $EI_p$ not $EI_s$ $EI_v = 2{EI_s}$ but $EI_p \neq EI_s$ (if $d_a = d_s$, then lf = $\frac{4}{3}$
r=1
, because otherwise the r
for the different methods will be different, and we will get different values, also for arbitrary new programs, we will not know r
. We could later, experiment with a program-specific r
based on the people who have labeled, but let's leave that for later until we see what proportion people will label if they do not need to do so.We will need to decide how we will include this in OpenPATH, but that is not a key component of the paper because it is too OpenPATH specific. We could add a small section in the discussion about how we used it but it should be very minor.
The reconciliation between Shankari's and Michael's approaches: In brief, Shankari's distance $d$ is half the total distance, so the total sensed car VMT is effectively being multiplied by 3/4 rather than 3/2.
Let D = total distance sensed as car. Let drove alone distance (a) be the same as shared ride distance (s). Then a = s = d = D/2, r = 1. Let $EI_v$ be the vehicle energy intensity aka the drove alone EI, and $EI_p$ be the energy intensity for a person, aka the shared ride EI. Let $EI_v = 2EI_p$.
Since d = a = s, the total energy consumption for sensed car trips over a distance D is:
$aEI_v + s\frac{EI_v}{2} = dEI_v + d\frac{EI_v}{2} = \frac{D}{2}EI_v + \frac{D}{2}\frac{EI_v}{2}$ $=\frac{3}{2}\frac{D}{2}EI_v = \frac{3}{4}D\times EI_v$ which is also the same as: $\frac{3}{2}dE_v =3dE_p$
Now set Shankari's version equal to the case where we apply 1/z to both shared rides and drove alone trips and solve for z: $3dE_p = E_v(\frac{1}{z} rs + \frac{1}{z} s)$ $3dE_p = E_v(\frac{1}{z} (1)d + \frac{1}{z} d)$ $z\times 3dE_p = E_v\times 2d$ $z = \frac{E_v\times 2d}{3dE_p} = 2/3*2 = 4/3$
Thus, multiplying by distance* $EI_v$ by (1/z = 3/4) is the same as multiplying $EI_v$ times half the distance by 3/2. Example:
sensed_car_distance = 10
# half goes to shared ride, half goes to drove alone.
d = sensed_car_distance/2
r = 1
EI_drove_alone = 1.5
EI_shared_ride = EI_drove_alone/2
# write Shankari's in terms of shared ride, which is the same as
# multiplying EI_drove_alone *d* 3/2
shankari_EC = 3*d*EI_shared_ride
# Michael's in terms of drove alone:
michael_EC = 1/(4/3)*10*EI_drove_alone
Both give 11.25.
Good news: The sensitivity analysis doesn't look too bad for vail when we drop air, drop outliers below 0.1 percentile, use r = 0.91, and use a car precision of 0.83.
Bad news: our variance is smaller than it should be if we are treating user labeled as ground truth. Other bad news: We systematically overestimate by a lot with pueblo county (pc) data when we drop air, drop outliers below 0.1 percentile, use r = 0.71 (as it is in pc), and use car precision of 0.739 (as it is in pc).
$VMT = 2d = D$ $PMT = 3d = \frac{3}{2} D$ $EC_{shankari} = 3d \times EI_s$ $= (3/2) \times D \times EI_s$ $= (3/4) \times D \times E_d$
I'll update this with more info but here are some ideas on what analyses I will run:
Parameters to change: With vs without outliers r = 1 vs r = dataset_specific_r with vs without dataset specific car precision
Results I will save for each analysis:
Here are sensitivity analyses for each dataset excluding stage. Vail and 4c are the only programs where the sensed energy consumption was lower than the user labeled energy consumption. Tomorrow I'll post the r value for each dataset and the percent error for the no processed case.
p stands for processed, as in I dropped outliers above the 99.9 percentile and below the 0.1 percentile, used the dataset specific r value, and used the car precision found in the dataset.
The other case is where I used r = 1, car precision from mobilitynet, and did not drop outliers.
vail processed | sc p | pc p | fc p | cc p | all_no_stage p | 4c p |
---|---|---|---|---|---|---|
vail | sc | pc | fc | cc | 4c | all no stage |
---|---|---|---|---|---|---|
The percent errors shown below are for the case where we use r = 1, car precision from Mobilitynet, and keep outliers. Note that these values were found after dropping both user labeled air trips and sensed air_or_hsr_trips.
dataset | percent error for expected | percent error for predicted | r value | car precision seen in dataset |
---|---|---|---|---|
vail | -12.9 | -6.6 | 0.833 | 0.835 |
pc | 21.7 | 27.4 | 0.730 | 0.747 |
fc | 11.77 | 13.08 | 0.713 | 0.801 |
cc | 5.25 | 14.13 | 0.591 | 0.693 |
4c | -6.92 | -0.62 | 0.513 | 0.512 |
sc | 13.21 | 9.95 | 0.566 | 0.853 |
all_no_stage | 8.84 | 14.57 | 0.637 | 0.730 |
For the plots below, I dropped trips that users labeled as not a trip or for which mode_confirm is nan. In the last two comments I made and in this one, the sensitivity analyses also drop air trips. For the case where we use r = 1, car precision from Mobilitynet, and keep outliers:
dataset | percent error for expected | percent error for predicted | r value | car precision | sensitivity plot |
---|---|---|---|---|---|
all programs | 3.37 | 9.18 | 0.642 | 0.779 | |
stage | -1.84 | 3.59 | 0.667 | 0.909 |
We will later implement similar functionality for carbon footprint, energy/carbon impacts, mode counts, and mode share by distance.
Saving relevant info to the database (Pre-compute as much as possible):