djdd87 / SynoAI

A Synology Surveillance Station notification system utilising DeepStack AI
GNU General Public License v3.0
208 stars 24 forks source link

Feature-Request: "Perfect Shot" #35

Open AlexanderSch90 opened 3 years ago

AlexanderSch90 commented 3 years ago

The program is running perfectly. Thanks for your work. If you are still open to a "feature request": It would be cool if you, for example, request 3 images every 2 seconds during movement / trigger and have them analyzed by Deepstack and then only forward the image that has the highest value (which above the threshold). This is how you get the image with the best result.

djdd87 commented 3 years ago

Yeah, I'll try and add to it when I can. Unfortunately time is very limited at the moment. Good idea though. I'd be curious to see what strain this puts on the SSS API and DeepStack though.

JohannCR commented 3 years ago

That would probably enhance results' quality quite a lot since most detection failures are due to an empty snapshot taken at the wrong time. What do you think ?

djdd87 commented 3 years ago

@JohannCR the snapshot should be taken at the moment that Surveillance Station sends the motion alert to SynoAI. This feature request won't massively help with that I don't think. Image a person running left to right. If they're half way across the camera view when SSS thinks "yes that's movement", then by the time SynoAI takes the snapshot they're probably more like 75% across the screen, then when the next 2 snapshots are fetched, the person is probably long gone.

JohannCR commented 3 years ago

@djdd87 ok that makes sense. I guess I have to look elsewhere for why I got lots of empty snapshots... There's a delay somewhere I'll have to time it.

JohannCR commented 3 years ago

@djdd87 Damn I was so wrong... Turns out SSS is too fast to trigger synoAI 🤣 is there a way to slow it down 1 second or 2 ?

AlexanderSch90 commented 3 years ago

Too fast is good for this Feature request ;-)

euquiq commented 3 years ago

Problem I see with "perfect shot" is that it will take time, which may be crucial for the "real-time" factor.

Suppose:

I set up the motion trigger for a spacing of 30 seconds, so SSS will send me the motion trigger as soon as it detects motion, and then will wait 30 seconds to send me another trigger (if the same or a new motion happens).

On those 30 seconds, SynoAI takes the job of asking for a new snapshot each second in order to "detect" an object of interest.

So when it finds objects, it takes note of the overall percentage (or some other mechanism to determine the "best" snapshot) and saves it to disk.

If a new snapshot shows a better number, it deletes the old one and saves this better snapshot instead.

At the end of the 30 seconds cycle, it just notifies the user with the existing snapshot (which should be the best).

PROBLEMS I see:

  1. "collateral" issue: it will need permissions to delete images inside the NAS (maybe that's already possible)

  2. I want to be alerted "in almost realtime" (right now it is like 2 to 4 seconds after the object has been detected) but with this mechanism I will be alerted about 35 seconds later or more (considering that each snapshot takes 150 - 200 ms time for saving)

  3. Depending on the scene, those 30 seconds can carry lots of objects, not only "one of interest" and I might start missing -say- certain "people" object because a newer one just appeared, and with a better percentage.

  4. Similar to 3, the camera may be facing a place where a bunch of people start moving, Some will have very good certainities, others not. This may prove a difficult scenario to evaluate.

JohannCR commented 3 years ago

@euquiq good points. Thing is we may all have different needs... For me I only need certainty of person detection in a maximum delay of 10 seconds or so. That's doable. Don't care about different people. I also would use it for better cat/dog detection but delay would not be important then.

Can't wait to monkey with it 😁

AlexanderSch90 commented 3 years ago

@euquiq Considerations for the problems you have described:

  1. The images could also be saved / overwritten within the Docker container. With a URL such as /api/events/objects/snapshot.jpg

  2. That would also be solvable if you built in a variable that depicts the time of recognition. Example: new; Update; End. If you have a preference that you will receive a notification as quickly as possible, then you trigger on "new". If you want the best possible picture, then you trigger on "end".

  3. I agree

  4. I agree

djdd87 commented 2 years ago

I think the only solution to points 3/4 is to sum all the percentages together and just assume the result with the biggest total summed percentage is the best image. e.g. if I have 3 images containing a person and a car (and the camera is set up to look for person & car) then I could end up with:

In this example, the image I would return would be Image 3, because 90 > 85 > 80. However, it seems a bit odd, because Image 2 actually had the best prediction of a person. However, if the camera is set up to look at Car and Person, then Image 3 probably is the right balanced choice?

JohannCR commented 2 years ago

Wouldn't it be possible to choose a type of object to optimize ? If it's person -> image 2 If it's car -> image 3 If it's both person and car -> Image 3

Average would be better than sum in case there are different number of same objects in one image...

djdd87 commented 2 years ago

If I can get away with a single "preferred" type, then a simple "Prefer" config would solve it.

{
  "Name": "Driveway",
  "Types": [ "Person", "Car" ],
  "BestShot": {
      "Mode": "Sequential",
      "Count": 10,
      "Wait": 50,
      "Prefer": "Person"
  }, 
  "Threshold": 10,
  "MinSizeX": 10,
  "MinSizeY": 10
},

So in the above config, we've got a camera looking for "Person" and "Car", it'll fetch a snapshot 10 times sequentially waiting 50ms after it's received each snapshot before getting the next (this could be 0/optional in all likelihood and is more useful for a "simultaneous" mode - described later in this comment).

I'd need to group the summed percentages by type and then order by the preference. So if we had 10 images, only 1 of which is containing a person, 1 with a person and a car and the other 8 containing only cars, then we'd end up with something in code in order like this:

Additionally, aside from this percentage based issue, we could have a mode to define simultaneous requests with a delay (the delay would need a minimum to avoid spamming for 10 images at exactly the same time):

{
  "Name": "Driveway",
  "Types": [ "Person", "Car" ],
  "BestShot": {
      "Mode": "Simultaneous",
      "Count": 5,
      "Wait": 100,
      "Prefer": "Person"
  }, 
},

So instead of waiting for the previous request to finish, this mode would request a snapshot, wait 100ms, then request another, wait 100ms, etc. So if it takes 1000ms for SSS to return the first snapshot, we would have fired off 10 requests already.

JohannCR commented 2 years ago

That could work, will be awesome to test it out^^

What if an image contains two persons at 30% each plus a car at 80% ? 😁 It would be first in line despite the bad percentages for person detections... Or maybe I misunderstood

djdd87 commented 2 years ago

Well yeah that's my point. Person would trump Car. Not sure how else to handle it, unless we multiply person percentages by 2 or something arbitrary.

JohannCR commented 2 years ago

Hum not sure we understood each other^^ I get that you're using a preferred type to handle multiple types responses which is a good solution, I'm just wondering about the summing method, because it would favor multiple bad detections over one good detection (even over the preferred type).

ghost commented 2 years ago

Yeah, I'll try and add to it when I can. Unfortunately time is very limited at the moment. Good idea though. I'd be curious to see what strain this puts on the SSS API and DeepStack though.

I've been having a play lately and moved my DeepStack container to a 6th gen i5 tiny PC I use to run the surveillance station client on a screen. Currently I have it getting 2560x1920 snapshots to process, it's taking on average 300-400ms to analyze, so the whole movement to notification is taking less than a second.

Because of this, I have the delay and wait values set to 0 for SynoAi, and have SSS trigger the webhook 3 times, once every second. So far it seems to be able to cope with this with 13 cameras and return under 1 second on average.

This is way quicker than it was when DeepStack was running on my NAS (a 918+ 8GB), it was averaging around 3-4 second total time and was getting flogged constantly. I think it would get totally congested and just give up doing things as I started to notice some breaks in what should be contant recordings.

Just thought it might be interesting know.

ichbinrodolf commented 11 months ago

Because of this, I have the delay and wait values set to 0 for SynoAi, and have SSS trigger the webhook 3 times, once every second. So far it seems to be able to cope with this with 13 cameras and return under 1 second on average.

Thanks for the information, I just discovered that we could retrigger a webhook notification :)

I try to improve the detection by having "better" images sent to deepstack. I assume that in your configuration you don't rely on the SynoAI "MaxSnapshots" setting right ?