e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Estimate mean and variance of energy consumption #798

Open allenmichael099 opened 2 years ago

allenmichael099 commented 2 years ago

We will later implement similar functionality for carbon footprint, energy/carbon impacts, mode counts, and mode share by distance.

Saving relevant info to the database (Pre-compute as much as possible):

allenmichael099 commented 2 years ago

Computing aggregate energy consumption means and variances on the fly: The user on the app or public dashboard selects a set of trips (T) somehow - date range, maybe program on the dashboard.

mean_EC_agg = 0    # aggregate energy consumption
var_EC_agg = 0

for t in T:
# Get trip length error
 check OS, configuration for trip t
 fetch the unit length (u) mean and variance for OS, configuration
 mean_L = t.L_sensed * mean_u
 var_L = t.L_sensed^2 * var_u

# Get mode error
check mode inference model (model) used    # we could check the "algorithm_chosen" field for the trip
 if model is sensing:
     fetch the EI mean and variance corresponding to t.predicted mode, OS,configuration
 if model is label assist:
     fetch the predicted mode column from the corresponding user specific label assist confusion matrix
 if model is "user labeled":
     mean_EI = EI(user labeled mode)
     var_EI = 0

 # Compute trip level EC mean and variance
 mean_EC = mean_L*mean_EI
 var_EC = var_EI*mean_L^2 + var_L*mean_EI^2

 # Add to total
 mean_EC_agg += mean_EC
 var_EC_agg += var_EC

The above can probably be vectorized. Then we can present the mean value (mean_EC_agg) +- sqrt(var_EC_agg) as a dot on a number line with error bars.

allenmichael099 commented 2 years ago

Early steps:

allenmichael099 commented 2 years ago

Evaluation: Sensitivity Analysis: Consider only labeled trips.

Monte Carlo Simulation with mock trip lengths and modes: (I think) this will show what the distribution of aggregate energy consumption could look like if the following assumptions hold:

When I ran a simulation with 3 modes and trip lengths uniformly sampled from 3-50 units, aggregate energy consumption appears normally distributed. I'll need to explain the simulation carefully to make sure it checks what I want to check. I can make the simulation closer to what we are doing by sampling from the correct distance relative error distribution. For distance, now I just use a normal distribution with the same mean and variance as one of the OS,configuration pairs. For mode, I use multinoulli distributions.

Also: make unit tests

allenmichael099 commented 2 years ago

Current tasks:

Upcoming tasks:

allenmichael099 commented 2 years ago

For the sensitivity analysis, I will start with sensing first. With label assist, I would need to retrain on smaller percentages of the data. However, we can look at the current proportion of label assist.

Question: at what threshold of confidence should we switch to label assist over sensing? Use current threshold of 0.25 for now.

shankari commented 2 years ago

~ 50k labeled trips: all 50k labeled 40k labeled, 10k sensing 30k labeled, 20k sensing ... all sensing

allenmichael099 commented 2 years ago

Right now we are using manually constructed confusion matrices from Gabe (kennykos)'s paper. Upon doing the sensitivity analysis, we found that sensing overestimated aggregate energy consumption for vail by about 2000. We thought this might be because sensing cannot distinguish between drove alone and shared ride. So, I created a "sensed_car" category and assigned it an energy intensity of 1/1.5 EI(drove_alone). This led us to underestimate* energy consumption, even though the distance share of shared rides is 1.180941e+06 meters higher than drove alone's distance share for vail.

Below, "sensing-confusion" refers to the use of the confusion matrix to get an expected energy intensity (mean_EI). "sensing-prediction/prediction/naive approach" refer to the use of the sensing model predicted EI as is.

I also took a look at the set of trips without shared rides. We still underestimate with a sensing-confusion approach, while the sensing-prediction No shared rides and load factor of 1: EC from: sensing with confusion matrix based EI, sensing with predicted EI, user labels: (6191.57, 7592.53, 7140.17)

EC with shared rides and a load factor of 1.5: Naïve (predictions only): 9046.67 User labeled: 10846.41 Sensed with confusion: 9028.78

allenmichael099 commented 2 years ago

Here is the vail sensitivity analysis when we use a load factor of 1.5 and include shared rides. Sensing_sensitivity_analysis_vail

allenmichael099 commented 2 years ago

[Yesterday 12:11 PM] Shankari, K. removing shared rides, user label -> confusion matrix drops; but user label -> predicted label increases

so I'm wondering if the reason is that the majority of the EC is due to drove alone, and every time we use the CM, every car trip's EC goes down because of the other probabilties

but you would expect that sometimes you would get it wrong the other way (e.g. predict bike for car trip) so the other probabilities would have a higher energy intensity

if the CM rows are the ones in your example

and I have 5 car trips and 2 bike trips ground truth

if we count the 5 car trips as car, the EC will be 1.512 \times 5 + 0 \times 2

but if we use the CM, the car EC will become 0.7 \times 1.512 \times 5 + 0.022 \times 0.2 \times 5

which is lower than the original

but with the CM, the bike EC will become 0.1 \times 1.512 \times 2 + 0.022 \times 0.4 \times 2

which is higher than the original

[Yesterday 12:27 PM] Shankari, K. For trips that you are counting as "sensed": choose the label for each predicted section For trips that you are counting as "user label": set the user label as the mode for only the primary sectionassign the others as walk/bike

[Yesterday 12:28 PM] Shankari, K. so you can: apply the user label to only the primary section and compute EC at the section levelapply the primary sensed mode to the full trip and compute the EC at the trip level

[Yesterday 12:32 PM] Shankari, K. you can compute the decrease due to car trips and the increase due to everything else (including the distance in each) and see if that matches up with our explanation above

allenmichael099 commented 2 years ago

Updates from meeting: Define primary section as the longest distance section. Define the primary mode as the mode of the primary section. Tied distances are broken by the mode with highest energy intensity. Looking only at primary section and computing EC: Prediction: 10582 User labels: 9483 Confusion based: 7194

Looking at the primary section as the mode for the whole trip and computing EC: Prediction: 12273 User labels: 10841 Confusion based: 8260

So prediction is still closer to user labels than confusion based. Another thing of note is that P(userlabel = car| predict car) is about 0.83, not the 0.45 in MobilityNet.

[3:22 PM] Shankari, K. assuming that the difference because of using the CM is due to car trips, it seems like there can be two separate reasons for why we are off: car precision is < 40% (40% of the time when we guess car, it is a car)car is so much more energy intensive than the other modes (car is 1.512 and e-bike is 0.022)

[3:27 PM] Shankari, K. things to experiment with: FIRST: Try to change the car precision in increments of 10% and see how the plots change

[3:28 PM] Shankari, K. then see if we can influence the precision in a way that makes sense: Look at Figure 4 in Gabe's paper: "AUTOMOTIVE" prediction is actually pretty good, over 90%. Is there a way we ca start with a much smaller confusion matrix - walk/bike/automotive. we could use "automotive" EI based on distribution of motorized modes in the NHTS for examplewe could also experiment with combining confusion matrices from the program (Vail) and MobilityNet

[3:30 PM] Shankari, K. if the US as a whole has 60% car, 20% bus, 20% train

allenmichael099 commented 2 years ago

Results from adjusting car precision: If we say car has a load factor of 1.5, the car EI is actually less than the bus EI. So when we increase car precision while accounting for shared rides, the expected EI decreases! This would lead to an even lower expected energy consumption (confusion based energy consumption is expected energy consumption here).

If:

Then the (floored) ECs are: car precision: 0.55 prediction,confusion based: (6071, 5859) car precision: 0.65 prediction,confusion based: (6071, 5939) car precision: 0.75 prediction,confusion based: (6071, 6018) car precision: 0.85 prediction,confusion based: (6071, 6097) User EC: 6391

A similar pattern holds if we apply the primary mode to the full trip - expected energy consumption does not pass predicted energy consumption until car precision is between 0.75 and 0.85. User label based energy consumption stays higher than both.

shankari commented 2 years ago
walk mm e-mm bus car train EC
0.5 0.6 0.7 0.8 0.9 0.5 23232
0.8 0.7 0.6 0.5 0.8 0.7 23232
for walk_precision in range(0.5, 1, 0.1):
   for mm_precision in range(0.5, 1, 0.1):
         for e_mm_precision in range(0.5, 1, 0.1):
....
shankari commented 2 years ago

Predicted is doing so much better because the "ground truth" sample is skewed and most trips are car, so even if we just predicted car every time, we would probably do pretty well. So another experiment would be to actually construct a balanced sample through sampling the same number of trips from each mode.

shankari commented 2 years ago

The sensed should be able to work for a study in which nobody labels any trips.

shankari commented 2 years ago

If we say car has a load factor of 1.5, the car EI is actually less than the bus EI. So when we increase car precision while accounting for shared rides, the expected EI decreases! This would lead to an even lower expected energy consumption (confusion based energy consumption is expected energy consumption here).

So if we set the load factor to 1.5, and increase car precision, the expected energy is becomes lower than with lower precision. But why is it then lower than user label? Because user label also has a mix of shared ride (at LF = 2) and drove alone (at LF=1)

Another validation thing to do rather than black box sensitivity analysis is to just compare the trips one by one and compare the user label EC and the predicted EC and the predicted w/ CM EC and see which are the 10 biggest or plot their distribution by mode

allenmichael099 commented 2 years ago

Left: expected EC - user labeled EC. Right: predicted EC - user labeled EC x axis label is the user label

trip_level_errors

shankari commented 2 years ago

Based on the current data:

should think about how we can automate this instead of having Michael look at it manually

allenmichael099 commented 2 years ago

confusion based error 90th and 5th percentiles: (1.80, -5.39)

Summary of confusion based error: mean -0.907056 std 10.649994 min -405.592424 25% -1.436356 50% -0.446962 75% 0.040015 max 95.792431

Counts by mode where errors are below the 5th percentile: drove_alone 74 bus 15 shared_ride 5 free_shuttle 4 car 2 taxi 2

allenmichael099 commented 2 years ago

If you remove trips below the 5th percentile of confusion based error (in the negative direction): expected 7198.36 predicted 7267.31 user_labeled 7323.29

allenmichael099 commented 2 years ago

If we drop the trips below the 0.1 percentile, the total energy consumptions are: expected 8651.04 predicted 8764.61 user_labeled 9943.78

Percent error: -13%

allenmichael099 commented 2 years ago

If you do not drop trips: expected, 8991.82 predicted, 9005.56 user labeled, 10838.58

Percent error (expected - user labeled)/user labeled: -17%

allenmichael099 commented 2 years ago

Trips where the error is below the 0.1 percentile: mode_confirm | distance | error_for_confusion drove_alone | 1.445275e+05 | -59.959334 drove_alone | 2.136239e+05 | -88.479020 shared_ride | 1.039817e+06 | -405.592424

shankari commented 2 years ago

what was the sensed mode for those trips?

allenmichael099 commented 2 years ago

car,car, and no sensed for the primary sensed mode. There were also sensed walking sections. image

allenmichael099 commented 2 years ago

Dropping those trips and looking only at the shared ride and drove alone error histograms, we get: Screen Shot 2022-10-25 at 1 24 02 PM

shankari commented 2 years ago

wrt https://github.com/e-mission/e-mission-docs/issues/798#issuecomment-1290890424

car,car, and no sensed for the primary sensed mode. There were also sensed walking sections.

It looks like the sensed mode was actually car. So we actually got the mode right. So is the difference primarily due to the confusion with bus or due to the walking sections?

shankari commented 2 years ago

Also, is it possible to identify the +ve and -ve contribution by each mode? I see the histogram, but adding up the values is hard to do visually, especially since the histograms don't have too much detail.

something like: shared_ride: expected (xxx), user label (yyy), net (zzz)

allenmichael099 commented 2 years ago

Here are the contributions of shared ride and drove alone, after dropping the three outliers below the 0.1 percentile:

shared_ride: Name Value
expected 3735.418826
predicted 3976.395993
user_labeled 3175.102523
error_for_prediction 801.293470
error_for_confusion 560.316303
drove_alone: Name Value
expected: 3530.997714
predicted: 3409.038991
user_labeled: 5719.979116
error_for_prediction: -2310.940124
error_for_confusion: -2188.981402
shankari commented 2 years ago

user_labled = 5719 + 3175 = 8894 predicted = 3976.395993 + 3409.038991 = 7385.434984 (-1508)

Why is drove_alone so different and why is it always lower?

allenmichael099 commented 2 years ago

If we look at trips with both primary mode car and user label drove alone, and count every section as car, the EC estimate would increase by 48.86 kWH. The distance increase in sensed car is 49.32 miles.

Doing some math, I think we will always be too low when we have a load factor of 1.5. 4/3 = 2/1.5

(perfect prediction drove alone EC) - (user labeled drove alone EC) + (perfect prediction shared ride EC) - (user labeled shared ride EC) 5719.979116/1.5- 5719.979116 + 3175.102523*4/3 - 3175.102523 = -848.29

So if we labeled all drove alone and shared rides as car, and ignored confusion, we'd still be off by 848.

shankari commented 2 years ago

Doing some math, I think we will always be too low when we have a load factor of 1.5.

But I'm not sure I follow the calculation. Can you expand a bit? Concretely, where do you get the 4/3 from and why does it matter? And are you arguing that the difference is because the load factor is actually not 1.5?

allenmichael099 commented 2 years ago

If we look at trips that users labeled as shared ride, assume that we sensed car, and ignore confusion:

We want to replace the load factor of 2 for the user labeled shared ride trips with a load factor of 1.5, so we multiply by 2/1.5 = 4/3.

4/3 represents a conversion factor from the shared ride user labeled value we got to the value that we would calculate if we predicted those trips to be shared ride and used a load factor of 1.5.

My argument is that applying the average load factor of 1.5 to both drove alone and shared ride will be systematically off.

shankari commented 2 years ago

Let's think about PMT (Person Miles Traveled) versus VMT for a bit:

double check: overall VMT = 2000, load factor = 1.5, overall PMT = 2000 * 1.5 = 3000, which is correct

allenmichael099 commented 2 years ago

x = 1 # drove_alone_load_factor
y = 2 # shared_ride_load_factor

# VMT
v_miles_in_drove_alone = 10000   # drove alone mean distance in miles.  
v_miles_in_shared_ride = 10000   # shared ride mean distance in miles.   # 100 -> off by 20, 1000 -> off by 200
avg = (x+y)/2

drove_alone_EI = 1.51517707
shared_ride_EI = 0.757588535

# without average means that we use the correct energy intensity of each mode. 
# For "with average", we use the average value for both.
no_average = 1/x*drove_alone_EI*v_miles_in_drove_alone + 1/y*drove_alone_EI*v_miles_in_shared_ride
with_average = 1/avg*drove_alone_EI*v_miles_in_drove_alone + 1/avg*drove_alone_EI*v_miles_in_shared_ride

print(f"drove_alone without average, with average: {1/x*drove_alone_EI*v_miles_in_drove_alone,1/avg*drove_alone_EI*v_miles_in_drove_alone}")   # with avg underestimates
print(f"shared_ride without average, with average: {1/y*drove_alone_EI*v_miles_in_shared_ride,1/avg*drove_alone_EI*v_miles_in_shared_ride}")   # with avg overestimates
print(no_average,with_average)
shankari commented 2 years ago
no_average = vda + 0.5 vsr
average = (vda + vsr) / 1.5

if vda = vsr

no_average = 1.5 vsr
average = 4/3 vsr
allenmichael099 commented 2 years ago

Let s = distance in shared ride, a = distance in drove alone, r = a/s = the ratio of drove alone distance to shared ride distance in our dataset. E = drove alone EI Let z be the load factor.

The energy consumption for drove alone and shared ride trips is actually: aE+ s(E_v/2) = E_v(rs + s/2)

We want to find the value of z such that applying it to both shared ride and car trips is the same as if we had used the drove alone EI for all drove alone and the shared ride EI for all shared ride: $E(rs + \frac{s}{2}) = E_v(\frac{1}{z} rs + \frac{1}{z} s)$ $rs + \frac{s}{2} = (\frac{1}{z}) rs + (\frac{1}{z}) s$

$z = 2\frac{rs + s}{2rs + s} = \frac{2r+2}{2r+1} = \frac{r + 1}{r + 1/2}$

In vail, r is about 0.91, which gives z = 1.3546.

allenmichael099 commented 2 years ago

When we use z = 1.3546 and ignore the outliers below the 0.1 percentile, here are the errors for drove alone and shared ride:

Drove alone: Name Value
expected 3682.806188
predicted 3774.930611
user_labeled 5719.979116
error_for_prediction -1945.048505
error_for_confusion -2037.172927
Shared ride: Name Value
expected 3893.278983
predicted 4402.827483
user_labeled 3175.102523
error_for_prediction 1227.724960
error_for_confusion 718.176460
Sum of drove alone and shared ride: Name Value
expected 7576.085172
predicted 8177.758094
user_labeled 8895.081638
error_for_prediction -717.323545
error_for_confusion -1318.996467

The car prediction is still off, but the error decreased by about 800. Earlier the car prediction error was 801.293470 -2310.940124 = -1509.6.

If we look at all trips excluding the below 0.1 percentile outliers, the predicted is within 1 standard deviation of the user labeled value. Increased precision will shrink the standard deviation, but we'll see by how much. expected: 9018.06 predicted: 9702.64 user_labeled: 9943.78

Differences: predicted - expected: 684.587 user_labeled - predicted: 241.135

sd: 285.58

allenmichael099 commented 2 years ago

Our first pass at estimating load factor averaged the load factors, weighted by distance in each car submode: (Load factor for drove alone: 1, for shared ride: 2) $\frac{(1)a + (2)s}{a + s}$

But it seems that we should have been averaging the reciprocals of the load factors: z = $\frac{r+1}{r+1/2}$ = $\frac{a/s +1}{a/s+1/2}$ = $\frac{a+s}{a+s/2}$

1/z = $\frac{(1/1)a+(1/2)s}{a+s}$

shankari commented 2 years ago

I think that the main reason that this is not working is that we are using the wrong EI for the version with the load factor. Basically, let's say that $EI_v$ is the energy intensity for the vehicle and $EI_p$ is the energy intensity for a person. So for drove alone $EI_p = EI_v$ and for shared ride $EI_p = 0.5 EI_v \implies EI_v = 2EI_p$

Let's assume that the distance in shared ride and drove alone is equal just to reduce the number of variables and is $d$

If we have user labels, we will get

$EI = EI_v d + EI_p d = 2 EI_p d + EI_p d = 3 EI_p d$

With user label, our $VMT=2d$ and our $PMT=d + 2d = 3d$ (three people distance of $d$ each)

If we don't have user labels, we have VMT, but we don't have PMT, and we use the load factor to convert from VMT to PMT So without user labels, our $VMT = 2d$ and our $PMT = \frac{3}{2} \times 2d = 3d$

So with the load factor calculation, we get the correct PMT distance in both cases. So if there is any difference in the overall EI, it must be because of a difference in the EI for the mode that we are using.

Since we are multiplying it with the PMT, we should use the $EI_p$ and not the $EI_v$

without user labels,

$EI = 3d \times EI_p = 3 EI_p d$ which is the same as the EI for the user label method of calculation.

shankari commented 2 years ago

This is the original attempt, which doesn't work, so putting the load factor into the EI instead of the distance doesn't work.

$EI = EI_v d + EI_p d = EI_v d + 0.5 EI_p d = 1.5 EI_v d$

without user labels, $EI_{vlf} = EI_v / 1.5 = \frac{EI_v}{\frac{3}{2}} = \frac{2}{3} \times EI_v$

$EI = VMT \times EI{vlf} = 2d \times \frac{EI{vlf}}{1.5} = \frac{4}{3}d \times EI_v$

shankari commented 2 years ago

Shankari's definition of load factor: factor by which you multiply VMT to get PMT (if $d_a = d_s = d$, then lf = 1.5) Michael's definition of load factor: factor by which you divide $EI_v$ to get $EI_p$ not $EI_s$ $EI_v = 2{EI_s}$ but $EI_p \neq EI_s$ (if $d_a = d_s$, then lf = $\frac{4}{3}$

shankari commented 2 years ago

We will need to decide how we will include this in OpenPATH, but that is not a key component of the paper because it is too OpenPATH specific. We could add a small section in the discussion about how we used it but it should be very minor.

allenmichael099 commented 2 years ago

The reconciliation between Shankari's and Michael's approaches: In brief, Shankari's distance $d$ is half the total distance, so the total sensed car VMT is effectively being multiplied by 3/4 rather than 3/2.

Let D = total distance sensed as car. Let drove alone distance (a) be the same as shared ride distance (s). Then a = s = d = D/2, r = 1. Let $EI_v$ be the vehicle energy intensity aka the drove alone EI, and $EI_p$ be the energy intensity for a person, aka the shared ride EI. Let $EI_v = 2EI_p$.

Since d = a = s, the total energy consumption for sensed car trips over a distance D is:

$aEI_v + s\frac{EI_v}{2} = dEI_v + d\frac{EI_v}{2} = \frac{D}{2}EI_v + \frac{D}{2}\frac{EI_v}{2}$ $=\frac{3}{2}\frac{D}{2}EI_v = \frac{3}{4}D\times EI_v$ which is also the same as: $\frac{3}{2}dE_v =3dE_p$

Now set Shankari's version equal to the case where we apply 1/z to both shared rides and drove alone trips and solve for z: $3dE_p = E_v(\frac{1}{z} rs + \frac{1}{z} s)$ $3dE_p = E_v(\frac{1}{z} (1)d + \frac{1}{z} d)$ $z\times 3dE_p = E_v\times 2d$ $z = \frac{E_v\times 2d}{3dE_p} = 2/3*2 = 4/3$

Thus, multiplying by distance* $EI_v$ by (1/z = 3/4) is the same as multiplying $EI_v$ times half the distance by 3/2. Example:


sensed_car_distance = 10
# half goes to shared ride, half goes to drove alone.
d = sensed_car_distance/2 
r = 1
EI_drove_alone = 1.5
EI_shared_ride = EI_drove_alone/2

# write Shankari's in terms of shared ride, which is the same as 
# multiplying EI_drove_alone *d* 3/2
shankari_EC = 3*d*EI_shared_ride    

# Michael's in terms of drove alone: 
michael_EC = 1/(4/3)*10*EI_drove_alone
Both give 11.25.
allenmichael099 commented 2 years ago

Good news: The sensitivity analysis doesn't look too bad for vail when we drop air, drop outliers below 0.1 percentile, use r = 0.91, and use a car precision of 0.83.

Bad news: our variance is smaller than it should be if we are treating user labeled as ground truth. Other bad news: We systematically overestimate by a lot with pueblo county (pc) data when we drop air, drop outliers below 0.1 percentile, use r = 0.71 (as it is in pc), and use car precision of 0.739 (as it is in pc).

vail_no_point1_outliers_no_air_sensing_sensitivity_analysis

image

shankari commented 2 years ago

$VMT = 2d = D$ $PMT = 3d = \frac{3}{2} D$ $EC_{shankari} = 3d \times EI_s$ $= (3/2) \times D \times EI_s$ $= (3/4) \times D \times E_d$

allenmichael099 commented 2 years ago

I'll update this with more info but here are some ideas on what analyses I will run:

Parameters to change: With vs without outliers r = 1 vs r = dataset_specific_r with vs without dataset specific car precision

Results I will save for each analysis:

  1. Sensitivity analysis plot
  2. Mean, variance, sd, and error for each random split at each proportion sensed
  3. Errors for drove alone and shared ride
  4. Error histograms by mode.
  5. I'll print the r value and the car precision.
allenmichael099 commented 2 years ago

Here are sensitivity analyses for each dataset excluding stage. Vail and 4c are the only programs where the sensed energy consumption was lower than the user labeled energy consumption. Tomorrow I'll post the r value for each dataset and the percent error for the no processed case.

p stands for processed, as in I dropped outliers above the 99.9 percentile and below the 0.1 percentile, used the dataset specific r value, and used the car precision found in the dataset.

The other case is where I used r = 1, car precision from mobilitynet, and did not drop outliers.

vail processed sc p pc p fc p cc p all_no_stage p 4c p
vail_EC_sensitivity_with_dataset_for_car_precision_info_r_from_dataset_yes_remove_outliers sc_EC_sensitivity_with_dataset_for_car_precision_info_r_from_dataset_yes_remove_outliers pc_EC_sensitivity_with_dataset_for_car_precision_info_r_from_dataset_yes_remove_outliers fc_EC_sensitivity_with_dataset_for_car_precision_info_r_from_dataset_yes_remove_outliers cc_EC_sensitivity_with_dataset_for_car_precision_info_r_from_dataset_yes_remove_outliers all_no_stage_EC_sensitivity_with_dataset_for_car_precision_info_r_from_dataset_yes_remove_outliers 4c_EC_sensitivity_with_dataset_for_car_precision_info_r_from_dataset_yes_remove_outliers
vail sc pc fc cc 4c all no stage
vail_EC_sensitivity_with_MobilityNet_for_car_precision_info_r_from_TEDB_no_remove_outliers sc_EC_sensitivity_with_MobilityNet_for_car_precision_info_r_from_TEDB_no_remove_outliers pc_EC_sensitivity_with_MobilityNet_for_car_precision_info_r_from_TEDB_no_remove_outliers fc_EC_sensitivity_with_MobilityNet_for_car_precision_info_r_from_TEDB_no_remove_outliers cc_EC_sensitivity_with_MobilityNet_for_car_precision_info_r_from_TEDB_no_remove_outliers 4c_EC_sensitivity_with_MobilityNet_for_car_precision_info_r_from_TEDB_no_remove_outliers all_no_stage_EC_sensitivity_with_MobilityNet_for_car_precision_info_r_from_TEDB_no_remove_outliers
allenmichael099 commented 2 years ago

The percent errors shown below are for the case where we use r = 1, car precision from Mobilitynet, and keep outliers. Note that these values were found after dropping both user labeled air trips and sensed air_or_hsr_trips.

dataset percent error for expected percent error for predicted r value car precision seen in dataset
vail -12.9 -6.6 0.833 0.835
pc 21.7 27.4 0.730 0.747
fc 11.77 13.08 0.713 0.801
cc 5.25 14.13 0.591 0.693
4c -6.92 -0.62 0.513 0.512
sc 13.21 9.95 0.566 0.853
all_no_stage 8.84 14.57 0.637 0.730
shankari commented 2 years ago
allenmichael099 commented 2 years ago

For the plots below, I dropped trips that users labeled as not a trip or for which mode_confirm is nan. In the last two comments I made and in this one, the sensitivity analyses also drop air trips. For the case where we use r = 1, car precision from Mobilitynet, and keep outliers:

dataset percent error for expected percent error for predicted r value car precision sensitivity plot
all programs 3.37 9.18 0.642 0.779 all_EC_sensitivity_with_MobilityNet_for_car_precision_info_r_from_TEDB_no_remove_outliers
stage -1.84 3.59 0.667 0.909 stage_EC_sensitivity_with_MobilityNet_for_car_precision_info_r_from_TEDB_no_remove_outliers