Feature Request: Use machine learning models to estimate device position

ndbroadbent commented 9 months ago

I've noticed that the current algorithm has a few problems. Devices will bounce around the room randomly and sometimes appear in other rooms for a few seconds. The floor detection is also quite unreliable for me.

I think a machine learning model could probably do a much better job at estimating device positions and which floor they are on. It would be great if I could go around my house and take multiple measurements at various points. Some where I am holding my phone, some where it's in my pocket, etc. Then use this data to train a model to estimate my position in the house. I think I'm collecting more than enough data to do this accurately and reliably (16 ESPresense nodes for 4 devices), but I just need to train a model to filter out some of the noise.

Has anyone worked on something like this before? Is there a plugin or fork that I could try out? (I remember reading a blog post a while ago from someone who did something like this, but I don't think it was related to ESPresense.)

DTTerastar commented 9 months ago

Yes, this is absolutely a thing to try. I'm often reading papers and making new ILocators to try it out... The bigger issue is you need a training set. And it would need to be per floorplan. How do you think we could do that?

gunnarbeutner commented 9 months ago

I've actually built something that does that... as a byproduct while debugging my room classification model: It's an iOS app (https://github.com/gunnarbeutner/OpenUWBDemo - currently requires an Estimote UWB beacon, but I guess it could be made to not require that for some loss of positioning accuracy) that collects X, Y, Z -> RSSI pairs using ARKit/Nearby Interaction to track the device's position and an MQTT client to fetch the RSSI data:

The problems an app like that would have to solve (which mine sort of does... just not in a user-friendly way):

Align ARKit's coordinate system with the floor plan:

Could probably use the compass for the rotation, I just haven't really gotten around to make that work with NISession. Right now your phone just better be in the XZ plane facing towards positive X when the app starts.
For the translation part I just re-align the coordinates with the closest ESPresense node (based on RSSI) when the user taps the view.
For the Z coordinate: Calibrate that by having the user touch the floor and remember the lowest Z coordinate.
Both coordinate systems are already in meters so we don't need to scale them.

Collect/Output the data. It just spams location/RSSI pairs to the debug console which I then manually turn into a CSV file.

Here's an example map for one of my nodes:

capture

The dataset is here, if you want to play around with it yourself (I found CloudCompare quite useful to visualize the point cloud):

nodes.csv points.csv

As expected there's a sudden drop in signal quality between walls and other metal/concrete obstacles.

Personally I'd love to see a machine learning model or some kind of radio map - though calibrating that definitely requires more effort than just placing a few nodes here and there.

However, as a somewhat easier to attain step towards that maybe walls and other objects could have a tuneable absorption factor or fixed path loss ("+5 dBm for passing through this wall")? That would require ray-casting for the error function though which I'm sure is great for performance. Then again I guess you could pre-compute the per-node error map.

DTTerastar commented 9 months ago

Dude, you rock. Thank you for being part of the project :)

Maybe we could make a ESPresense base station that includes UWB radios in each node, then we could use an iOS App like what you've built to train the system.

gunnarbeutner commented 9 months ago

Thanks, I appreciate it!

UWB nodes would be kind of neat. A somewhat more accessible solution (for now anyway) would be using a second iPhone or an Apple Watch as a stationary UWB anchor. I'll probably add that to the app in a bit.

It's also now tracking the camera normal for each reference point. I figured that could be useful to exclude points where the camera is facing away from the node (i.e. when the user is between the phone and the node).

Oh, and there's a column for the Bluetooth channel.

points.csv

As for the model itself, well I don't know yet. I'd prefer something that's specific to the environment rather than to each node or worse... a collection of nodes.

gunnarbeutner commented 9 months ago

So, just a quick update on some of the things I've tried - and oh boy, there are a lot, most of which just didn't pan out:

I'm pre-processing the samples by creating separate point clouds for each node and then combining those point clouds based on a simple nearest neighbor search. Each point in the new cloud has scalar features for each of the nodes' RSSI measurements:

https://gist.github.com/gunnarbeutner/5b57a5e7f1cf31388669b689fbb24019

It would make a lot more sense to combine the points based on location and timestamp, that's on my TODO list.

As a visual aid I'm applying bilateral filtering to the combined point cloud (basically: RSSI -> distance, apply filter, distance -> RSSI). The resulting cloud shows the signal propagation characteristics for our apartment quite nicely and intuitively seems to make sense (i.e. walls and air ducts attenuate the signal where you'd expect them to).

https://github.com/ESPresense/ESPresense-companion/assets/388571/c9c5cfd0-e1bb-430c-a3ac-3bb9d5170102

I've tried to train a single model which predicts x, y, z for each measurement based on the RSSI measurements:

RSSI values -> x, y, z RSSI values converted to distances -> x, y, z Difference in RSSI values (i.e. a - b for all combinations of nodes) -> x, y, z Ratio of distances (i.e. a / b for all combinations of nodes) -> x, y, z

The idea behind using differences/ratios was to make the model somewhat device-independent, i.e. due to differences in 1m RSSI. Ideally this would also filter out constant attenuation (e.g. when my phone is in my backpack vs. when it isn't).

I've also tried adding the room classification as an additional output in an attempt to get the model to put more emphasis on room boundaries (2nd attempt: custom loss function that penalizes incorrect room predictions). Predicting x=1.0 vs x=1.1 is a more significant error compared to x=2.0 vs. x=2.1 when there's a wall at x=1.05.

That hasn't really worked well so far. The model has quite a bit of trouble generalizing to all the rooms. Where I'm currently at is two sets of models:

One classifier that predicts which room a measurement was taken in. And then one regression model for each room that tries to predict the location.

So, does it work? Well, it's definitely better at differentiating rooms, especially at some specific "trouble spots", i.e. where placement of nodes is limited due to the apartment layout. You wouldn't really need precise reference point measurements to train a room classification model. Also, the per-room location predictions seem to be better. The predicted locations are more uniformly distributed, whereas the optimization-based algorithm tends to favor locations closer to the nodes or closer to the rooms' center.

I'll have to do a lot more tests to see if this is better overall though. For now I haven't really run the model continuously for more than a few minutes at a time.

This could probably be expanded by having "areas of interest" rather than just rooms: Most of the time I don't really care about the X, Y, Z position, just knowing whether a device is in a specific zone would be enough. I'm currently using Home Assistant to do sensor fusion with ESPresense + motion/occupancy/door sensors to track who's occupying which rooms:

a) Room A is occupied and according to ESPresense my phone/watch is there: If HA doesn't already know my current room it initializes the state by picking room A. b) If an adjacent room becomes occupied (motion/occupancy sensor, door is opened, etc. - there's both a door contact and a motion sensor for each room "barrier" so I know when someone's entering/leaving a room) HA assumes I moved to that room. Each room also has a list of occupants. HA removes me from room A's list of occupants after a timeout.

That's the easy part and ESPresense is only really needed to figure out where I'm at the start. Tracking room changes is instantaneous.

It gets more complicated when I enter a room that was already otherwise occupied: Did I just enter the room or did the other person leave the room (i.e. enter the room I was in)?

Again, this is easier to differentiate when none of my (or the other person's) devices are moving (which would make a lot of sense, but isn't actually something I'm doing atm, because my HA config for tracking occupants is already a convoluted mess, 800+ lines not including general room occupancy and motion detection stuff. Once this room classification/location regression model works reasonably well I'll probably get rid of the HA config and move the functionality into a separate daemon). Otherwise HA will wait for ESPresense's room predictions to settle down (i.e. we're back to step A). HA also somewhat filters out incorrect predictions because it knows which rooms are occupied (thanks to mmWave occupancy sensors and some simple heuristics: the room must still be occupied when it was previously occupied and the door hasn't been opened, or - if the door is open - the door's motion sensor hasn't been triggered). Devices can only move between rooms when both rooms are occupied and the barrier is traversable. There is a fallback with a much larger timeout to allow such moves so devices don't incorrectly get stuck in the wrong rooms.

Which... all kind of begs the question: Do I even really want X, Y, Z coords at all (at least for my use case)? Hmm.

gunnarbeutner commented 9 months ago

Two more ideas:

The model allows for directional antennas, in fact it would probably make predictions better in general.

Reinforcement learning might be an idea for either predicting the location or learning to predict when a device is about to transition between areas of interest.

DTTerastar commented 8 months ago

My idea (that I haven't tried) is to just use a model for the rssi->distance calc. We can use the current data that is running the optimization as training for the model and then just use the current method for x,y,z.

stale[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ESPresense / ESPresense-companion

Feature Request: Use machine learning models to estimate device position #465