knoellle commented 1 year ago

We need a reliable way of detecting other robots via camera. There already exists a classical robot detection that is based on finding clusters in image segments.

This approach works to detect vaguely robot-looking objects, but it isn't reliable enough to make behavior decisions. It also doesn't provide other useful information such as the jersey color (or at least whether it vaguely looks like the own-team-color), whether the robot is fallen or standing up, or the direction it is facing.

While extracting the jersey color and improving reliability are perhaps achievable by extending/tuning the current approach, we should also look into machine learning based approaches. A combined detection network for any of balls, robots, lines, etc. is also possible.

IIRC @tuxbotix already looked at some options regarding this. Datasets of raw images recorded during previous events, some with labels and pre-built traing datasets for ball detection can be found at tools/machine-learning/data. The data can be retrieved from the bighulk in the lab via DVC. ~~During the team trip, @knoellle also has a copy of the DVC data on his laptop.~~

schluis commented 1 year ago

Telegram group

https://t.me/+eXH48n2Oo34zNTAy

Helpful tools

View NN strucutre: Netron Draw network structures: diagrams.net (draw.io)

Overview

https://writemd.rz.tuhh.de/Hka2oYKATWym374c8fxcpQ?both

tuxbotix commented 1 year ago

@schluis could we have the architecture diagram or the presentation here? Probably would make it easy to keep an eye :)

A few ideas/ themes I gathered after talking to other hulks;

We may not want to modify the existing ball network - any damage to quality of detection is unacceptable.
Alternatively we can have a parallel network like ballet for robot parts, etc without touching the ball net.
It might be easier to have a slow cycler to run a heavier network for now to detect other stuff -> Balls move fast, others not so fast!
It might benefit us at a later stage to consider reusing the backbone of ballnet for other parts like positioner. This isn't urgent but probably would save some runtime as the backbone seems to be the heaviest part of the network in terms of runtime.
Investigate if we can do some autoencoder style training for training/ developing new backbones, etc -> if we can play this right, we can get away with not so many labels or self supervised if it works ..

schluis commented 1 year ago

@tuxbotix I added the image in the linked PR Regarding 1, I completely agree, but I would like to try to achieve this in one network. If it is measurably worse, I agree that we need to split the networks.

tuxbotix commented 1 year ago

@schluis thank, I didn't notice it.

I also would like to get it done in one network, just the idea of isolation was after talking with @PasGl who reminded me the speed difference of the objects.

Since we have a decent amount of data and a good starting point, I think we can test this out once we figure out the data matters. If someone else can look into HULJs datasets, it would be great as I won't be able to visit the lab soon to copy one.

jonathan-hellwig commented 1 year ago

@schmidma and I tried a different approach yesterday. B-Human uses a neural network that outputs bounding boxes of all robots given a grayscale image of 80 x 60. We ran the network on the NAO yesterday and it seems to be working pretty well. We tested it with two robots on the field and it was able to detect robots in almost all poses. We also did some rough benchmarking today. The complete robot detection roughly takes 5 ms - 10 ms to execute. You can find the preliminary branch at @schmidma's remote.

jonathan-hellwig commented 1 year ago

I think we need to make some adjustments to our current architecture, if we want to integrate the B-Human network into out code base. I suggest we discuss this in a meeting in the next days.

tuxbotix commented 1 year ago

Adjustments due to the runtime or? I think in that case a new cycler as I mentioned above could.

And if a decision is made to use the BHumans network and not to go forward with further development on this topic, please let me know so I can look into another topic.

jonathan-hellwig commented 1 year ago

I suggest we integrate the B-Human net into our code base such that other teams can proceed with their work. However, I would like to continue to work on robot detection.

I have checked out the data we have available on the BigHulk. We have two categories of data available: classification and object detection. The classification data consists of cropped robot and ball images. The object detection data consists of unlabeled images from past games and images of the GermanOpen 2018 with bounding boxes for balls and robots. I have also checked out some of the datasets available online:

bounding boxes Maynooth University,
data for semantic segmentation: B-Human, Nao Devils.

jonathan-hellwig commented 1 year ago

@tuxbotix I was referencing the architecture @schluis posted above. I think the runtime speed using the B-Human network is fine.

schluis commented 1 year ago

@jonathan-hellwig @tuxbotix @alexschirr do you have time on saturday?

tuxbotix commented 1 year ago

@jonathan-hellwig ah now I got it. I also think that network in general is fine. I was using those two datasets, right now converting the first dataset into tfrecords to run a few experiments. The naodevils one can also be converted to bounding boxes with a bit of work as far as I can see. @schluis I can be there around 11:00 or some other time.

HULKs / hulk

Robot Detection #44

Telegram group

Helpful tools

Overview