CarlosBergillos / LocationHistoryFormat

Collaborative format definition and documentation for Google Location History files.
https://locationhistoryformat.com
MIT License
49 stars 6 forks source link

ActivityType Hierarchy #2

Closed jonathanschad closed 1 year ago

jonathanschad commented 1 year ago

Hey, I'm currently working on my masterthesis where I process the Google Location History. I stumbled across this Project (sadly a little bit late) and I love it. Your work helped me quite a bit to understand the data.

In the ActivityType Segment it is written that some types are sub-activity of IN_VEHICLE. From my finding I can confirm this. However I would add to this that IN_BUS and IN_CAR is is also a sub-activity of IN_FOUR_WHEELER_VEHICLE. If analyzed my data an for every activity IN_FOUR_WHEELER_VEHICLE was included IN_BUS and IN_CAR added up to the exact confidence of IN_FOUR_WHEELER_VEHICLE.

I also found that IN_FOUR_WHEELER_VEHICLE and IN_TWO_WHEELER_VEHICLE add up to IN_ROAD_VEHICLE (plus minus 1, which is probably a rounding mistake)

Also in a lot of cases (there are also a lot of cases where it not the case) IN_FOUR_WHEELER_VEHICLE, IN_TWO_WHEELER_VEHICLE and IN_RAIL_VEHICLE add up to IN_VEHICLE. In the cases where these values don't add up they are always bigger than IN_VEHICLE.

The hierarchy probably looks like this.

                    IN_VEHICLE
                       │   │
                       │   │
                       │   │
                       │   │
IN_RAIL_VEHICLE ◄──────┘   └───────► IN_ROAD_VEHICLE
                                          │   │
                                          │   │
                                          │   │
                                          │   │
                IN_FOUR_WHEELER_VEHICLE ◄─┘   └─► IN_TWO_WHEELER_VEHICLE
                         │   │
                         │   │
                         │   │
                         │   │
                IN_CAR ◄─┘   └──► IN_BUS
[
  {
    "type": "IN_VEHICLE",
    "confidence": 97
  },
  {
    "type": "IN_ROAD_VEHICLE",
    "confidence": 97
  },
  {
    "type": "IN_FOUR_WHEELER_VEHICLE",
    "confidence": 59
  },
  {
    "type": "IN_CAR",
    "confidence": 42
  },
  {
    "type": "IN_TWO_WHEELER_VEHICLE",
    "confidence": 37
  },
  {
    "type": "IN_BUS",
    "confidence": 17
  },
  {
    "type": "STILL",
    "confidence": 1
  },
  {
    "type": "UNKNOWN",
    "confidence": 1
  },
  {
    "type": "IN_RAIL_VEHICLE",
    "confidence": 1
  }
]

So there is definitely more to the story than there are subtypes to IN_VEHICLE. I hope this helped and please let me know if I should do some further testing.

CarlosBergillos commented 1 year ago

Hi Jonathan!

I'm happy you are finding this project useful :)

And hey, this is very interesting, thanks for the analysis and thanks for sharing! I hadn't considered that a more complex hierarchy could exist.

I checked with my data and I could replicate your observations. I was curious if there was more so I also built a quick script to automatically find these kind of acitivity relationships in my data (using a very simple logic, basically find all pairs of activitity types A and B, where A.confidence is always >= B.confidence). And the output was exactly the same hierarchy as the one you suggested :D (and also the already known ON_FOOT -> (RUNNING | WALKING)).

I think we can assume that this hierarchy is correct (and as far as we know complete), so I'll update the documentation with these new findings in the coming days.

Thanks again, and do share if you find anything else!

CarlosBergillos commented 1 year ago

Hi again! I've now added this hierarchy information in the ActivityType section.

This is the best way I've found to put it, unfortunately everything has to go in a single line string in the JSON schema (here) so it is a bit difficult to work with, I'm open to suggestions on how to handle and represent this information better.

For now I'll close the issue :)