BLE Summary with specific vehicles

JGreenlee commented 6 months ago

The current implementation of ble_sensed_summary on e-mission-server mimics the format of cleaned_section_summary and inferred_section_summary; it looks like this:

{
  "count": {
    "CAR": 1
  },
  "distance": {
    "CAR": 20184.92261045545
  },
  "duration": {
    "CAR": 1772.7775580883026
  }
}

We'd like to know specifically what vehicle it was; instead of just "CAR"; we want to know it was "car_jacks_mazda3". So we talked about having 2 versions of the summary.

{
  "count": {
    "car_jacks_mazda3": 1
  },
  "distance": {
    "car_jacks_mazda3": 20184.92261045545
  },
  "duration": {
    "car_jacks_mazda3": 1772.7775580883026
  }
}

But, if for example we wanted to calculate the carbon footprint based on the car's MPG, we'd still have to cross-reference with the dynamic config to find the vehicle that matches car_jacks_mazda3.

As an alternative, what if we use a different structure that will allow us to have 1 unified summary (an array of modes / "mode summary") ? Then we can include vehicle information in the summary.


[
  {
    "vehicle": {
      "value": "car_jacks_mazda3",
      "bluetooth_major_minor": ["dfc0:fff0"],
      "text": "Jack's Mazda 3",
      "baseMode":"CAR",
      "met_equivalent":"IN_VEHICLE",
      "kgCo2PerKm": 0.16777,
      "vehicle_info": {
        "type": "car",
        "license": "JHK ****",
        "make": "Mazda",
        "model": "3",
        "year": 2014,
        "color": "red",
        "engine": "ICE",
        "mpg": 33
      }
    },
    "count": 1,
    "distance": 20184.92261045545,
    "duration": 1772.7775580883026
  }
]

shankari commented 6 months ago

@JGreenlee interesting. The reason that we had the type of structure was from the "count every trip" project to add uncertainty to the metrics. And the reason the "count every trip" project had that structure, IIRC, was so that we could get a feature (like distance) and see the distribution across modes without having to iterate over sections. So if you wanted to get the primary mode, for example, you could do something like (trip['count'].idxmax()) to get the primary mode.

Having said that, transforming between the structures is not that hard (I think). I would suggest:

writing out what the code for that use case would look like (to verify that it is not too bad)
explaining how this fits within a trip; since an object {'vehicle': ...}, cannot be a key

JGreenlee commented 6 months ago

If the confirmed trip is a dict, it would have a property ble_modes_summary whose value is an array of objects, each object representing a mode. The object contains 'vehicle' with vehicle info, alongside 'count', 'distance', and 'duration'.

To get the primary mode, we could use the max function on ble_modes_summary with 'distance' (or 'count') as the key.

confirmed_trip = {
  "ble_modes_summary": [
    {
      "vehicle": { 
        "value": "vehicle1",
        ...,
       },
      "count": 1,
      "distance": 800,
    },
    {
      "vehicle": {
        "value": "vehicle2",
      },
      "count": 2,
      "distance": 1300,
    },
  ]
}

primary_mode = max(confirmed_trip['ble_modes_summary'], key=lambda x: x['distance'])
print('primary vehicle is ' + primary_mode['vehicle']['value'])

primary vehicle is vehicle2

shankari commented 6 months ago

ok, I think that there are only a couple more questions before we go ahead with this:

we need to have a backwards compatibility plan since we will need to rewrite all existing trips to the new format
- we will need a script to do the rewrite (which we should test on both the data in emission/tests and on a couple of real dataset snapshots)
- the script is likely to take a long time to run, at least for those of us who have been using the app for a really long time
- in the meanwhile, existing code (primarily in the public dashboard) needs to handle both
we may need not just the max but also the actual distribution so that we can get the probabilities (e.g. the probability that the trip was CAR or BIKE or WALK, which feeds into the uncertainty which feeds into the error bars).
I think another challenge here is that we can get the primary mode for one trip at a time fairly easily, but for any of the dashboards, we will need to work on an aggregate basis and the previous format might work better for that. Let's think through it

df = json.normalize(confirmed_trips)
df.columns

Will have ble_section_summary.E_CAR in the old method, will have ble_modes_summary.vehicle.baseMode.CAR in the new one, so maybe not a huge deal wrt grouping or other post-processing.

Is your proposal to only change this for the ble modes, or for the cleaned and inferred modes as well? I would prefer to have the same structure for all the *summary entries, although of course that will make the migration take longer. And it would also take more effort to generate the probability distributions above.

@JGreenlee do you have thoughts on what the same structure would look like for the cleaned and inferred section summaries?

JGreenlee commented 5 months ago

{
  "count": {
    "car_jacks_mazda3": 1,
    ...,
  },
  "distance": {
    "car_jacks_mazda3": 20184.92261045545,
    ...,
  },
  "duration": {
    "car_jacks_mazda3": 1772.7775580883026,
    ...,
  },
  "vehicles": {
    "car_jacks_mazda3": {
      "value": "car_jacks_mazda3",
      "bluetooth_major_minor": ["dfc0:fff0"],
      "text": "Jack's Mazda 3",
      "baseMode":"CAR",
      "met_equivalent":"IN_VEHICLE",
      "kgCo2PerKm": 0.16777,
      "vehicle_info": {
        "type": "car",
        "license": "JHK ****",
        "make": "Mazda",
        "model": "3",
        "year": 2014,
        "color": "red",
        "engine": "ICE",
        "mpg": 33
      }
    },
    ...,
  }
}

e-mission / e-mission-docs

BLE Summary with specific vehicles #1073