galliot-us / neuralet

Neuralet is an open-source platform for edge deep learning models on edge TPU, Jetson Nano, and more.
https://neuralet.com
Apache License 2.0
238 stars 71 forks source link

Refactor and improve Distance Calculation #42

Closed emmaWdev closed 1 year ago

emmaWdev commented 4 years ago

Have a class that builds different distance calculation method based on config file and create a different class for each distance method with similar interfaces.

mhejrati commented 4 years ago

@emmaWdev It will be great to do some basic R&D to figure out if we can any monocular perspective calculation using scene & object priors? https://www.researchgate.net/publication/312486995_Using_the_Scene_to_Calibrate_the_Camera https://www.csd.uwo.ca/~oveksler/Courses/Fall2007/840/StudentPapers/hoiem_cvpr06.pdf

Keeping in mind that we can afford to do expensive computations since the camera will be stationary and we only need to do calibration once.

alpha-carinae29 commented 4 years ago

@mhejrati need any help? It would be my pleasure to contribute.

mdegans commented 4 years ago

So, one thing I noticed trying to solve this last night in C is that distance calculations may be unnecessary. So, I haven't closely looked at your distance caculation code, but I assume you do some sort of camera space to world space transform. You need to know how many units is 6 feet for a given pair of boxes in a given location of the image, right?

But it may be that we don't need to do this calculation because we already have the result. Why not use the average bounding box height between the two objects when you compare distances? Think about it. People are about 6 feet high in average and the bounding box is just larger than that, accouting for variation in height. It's not exact, no, but neither is a perspective transform. It also means no calibration is necessary. Thoughts?

Edit: I also thought it might be better to use the midpoint of the lower edge of the bounding box as the center, since it's closer to the ground plane.

alpha-carinae29 commented 4 years ago

@mdegans thanks for your comment. Assuming a fix height for a person is something we considered in our current algorithm. However, we wanted to try some algorithm to consider depth of a scene for better estimation of distances.

mdegans commented 4 years ago

@alpha-carinae29 Aha. But the bounding box height changes with distance too, doesn't it? If a person is far away, the box height is ~six feet in real world units. If a person is close, the box height is ~six feet in real world units. Only children would be smaller, and in this case we don't want to alert if they're too close to their parents so it works out well anyway. Anyway it's an idea. I may experiement with it on my own. I'll update here if it turns out well.

mdegans commented 4 years ago

Hey, so it works better than I thought actually. Many thanks for Nvidia and the recent CUDA course that i took recently that gave me the idea. If I hadn't done that n-body simulation, I wouldn't have thought of this.

Feel free to use it anywhere so long as you preserve the license. I'll update my NValhalla project with this later. I can probably bind this for Python as well so it can be used directly as a pad probe callback in this project.

You can probably port this to anythining, but if you do an all-pairs calculation like this in Python, it might be slow, even with only a few elements. I will post a video tomorrow. Signing off for the night.

/* cb_distancing.c
 *
 * Copyright 2020 Michael de Gans
 *
 * 4019dc5f7144321927bab2a4a3a3860a442bc239885797174c4da291d1479784
 * 5a4a83a5f111f5dbd37187008ad889002bce85c8be381491f8157ba337d9cde7
 *
 * Permission is hereby granted, free of charge, to any person obtaining
 * a copy of this software and associated documentation files (the
 * "Software"), to deal in the Software without restriction, including
 * without limitation the rights to use, copy, modify, merge, publish,
 * distribute, sublicense, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so, subject to
 * the following conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE X CONSORTIUM BE LIABLE FOR ANY
 * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
 * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 *
 * Except as contained in this notice, the name(s) of the above copyright
 * holders shall not be used in advertising or otherwise to promote the sale,
 * use or other dealings in this Software without prior written
 * authorization.
 */

#include "cb_distancing.h"

#include <math.h>

static float
calculate_how_dangerous(NvDsMetaList* l_obj, float danger_distance);

GstPadProbeReturn
on_buffer_osd_distance(GstPad * pad, GstPadProbeInfo * info)
{
  float how_dangerous=0.0f;
  float color_val=0.0f;

  GstBuffer* buf = (GstBuffer*) info->data;
  NvDsObjectMeta* obj_meta = NULL;
  NvDsMetaList*  l_frame = NULL;
  NvDsMetaList* l_obj = NULL;
  NvDsBatchMeta* batch_meta = gst_buffer_get_nvds_batch_meta (buf);

  NvOSD_RectParams* rect_params;
  // NvOSD_TextParams* text_params;

  for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
      l_frame = l_frame->next) {

    NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);

    if (frame_meta == NULL) {
      GST_WARNING("NvDS Meta contained NULL meta");
      return GST_PAD_PROBE_OK;
    }

    // for obj_meta in obj_meta_list
    for (l_obj = frame_meta->obj_meta_list; l_obj != NULL;
         l_obj = l_obj->next) {
      obj_meta = (NvDsObjectMeta *) (l_obj->data);
      // skip the object, if it's not a person
      if (obj_meta->class_id != PERSON_CLASS_ID) {
        continue;
      }

      rect_params = &(obj_meta->rect_params);
      // text_params = &(obj_meta->text_params);

      // get how dangerous the object is as a float
      how_dangerous = calculate_how_dangerous(l_obj, rect_params->height);

      // make the box opaque and red depending on the danger

      color_val = (how_dangerous * 0.6f);
      color_val = color_val < 0.6f ? color_val : 0.6f;

      rect_params->border_width = 0;
      rect_params->has_bg_color = 1;
      rect_params->bg_color.red = color_val + 0.2f;
      rect_params->bg_color.green = 0.2f;
      rect_params->bg_color.blue = 0.2f;
      rect_params->bg_color.alpha = color_val + 0.2f;
    }
  }
  return GST_PAD_PROBE_OK;
}

/**
 * Calculate distance between the center of the bottom edge of two rectangles
 */
static float
distance_between(NvOSD_RectParams* a, NvOSD_RectParams* b) {
  // use the middle of the feet as a center point.
  int ax = a->left + a->width / 2;
  int ay = a->top + a->height;
  int bx = b->left + b->width / 2;
  int by = b->top + b->height;

  int dx = ax - bx;
  int dy = ay - by;

  return sqrtf((float)(dx * dx + dy * dy));
}

static float
calculate_how_dangerous(NvDsMetaList* l_obj, float danger_distance) {
  NvDsObjectMeta* current = (NvDsObjectMeta *) (l_obj->data);
  NvDsObjectMeta* other;

  // sum of all normalized violation distances
  float how_dangerous = 0.0;  

  float d; // distance temp (in pixels)

  // iterate forwards from current element
  for (NvDsMetaList* f_iter = l_obj->next; f_iter != NULL; f_iter = f_iter->next) {
    other = (NvDsObjectMeta *) (f_iter->data);
    if (other->class_id != PERSON_CLASS_ID) {
        continue;
    }
    d = danger_distance - distance_between(&(current->rect_params), &(other->rect_params));
    if (d > 0.0) {
      how_dangerous += d / danger_distance;
    }
  }

  // iterate in reverse from current element
  for (NvDsMetaList* r_iter = l_obj->prev; r_iter != NULL; r_iter = r_iter->prev) {
    other = (NvDsObjectMeta *) (r_iter->data);
    if (other->class_id != PERSON_CLASS_ID) {
        continue;
    }
    d = danger_distance - distance_between(&(current->rect_params), &(other->rect_params));
    if (d > 0.0) {
      how_dangerous += d / danger_distance;
    }
  }

  return how_dangerous;
}

Edit: fixed the black bug. Code now works as expected.

mdegans commented 4 years ago

So here's a video of the above code on Xavier. For some reason the bboxes are green, while on x86 they are red. This appears to be a bug in NvOSD_RectParams, but can be hacked around with some preprocessor defines.

https://youtu.be/1Ui7A061fpY

I will a post a video from x86-Nvidia shortly.

mdegans commented 4 years ago

Here it is on x86, with the proper color bounding boxes ~except for a small bug with the channel value clipping fixed. When some values go above 1.0, the box becomes black, but that's easy enough to fix. You get the idea of how it can work, and~ edit: fixed. It's very very fast.

https://youtu.be/cjP_3oOnUtQ

mhejrati commented 4 years ago

Nice. @mdegans I am a bit confused now with this thread and the other on #50, is there any problem with using this code for post-processing?

mdegans commented 4 years ago

Nice. @mdegans I am a bit confused now with this thread and the other on #50, is there any problem with using this code for post-processing?

So, the above code is from another project, written in C and Vala (GObject C shorthand, basically). If you'd like, I can port this algorithm to Python or Cython so we can use it in this project, but I mostly provided it as an example to give @alpha-carinae29 some ideas. I don't want to interfere with WIP.

If @alpha-carinae29 decides to use this method, The one thing that it may not be compatible with is the "bird's eye view", since it doesn't do any sort of screen space to world space transform. The calculations are entirely screen space based on the height of bounding boxes, but it seems to work pretty well and it's fast. That being said, speed is less important now for GstEngine since we do these calculations in a separate process. Even the existing code will work find if modified a bit.

mdegans commented 4 years ago

Hey, so I noticed non_max_suppression_fast is copypasta from this blog with the comments removed. I think we'll need a license to use it unless pyimagesearch has some page somewhere with a notice about all blog code that I can't find. I can reimplement it in another language (like Cython).

Also, I think it might be faster to do the cenroid calculation after. Likewise the code might not even be necessary. My understanding of the networks themseves is rudimentary, but don't most of them do the suppression inside? LMK what you want to do @mhejrati , or if you've updatet this code any @alpha-carinae29 or @mrn-mln .

alpha-carinae29 commented 4 years ago

@mdegans yeah we don't need these modules anymore since now we have a high accuracy object detector. Also the filtering and tracking algorithm is under development right now and the tracking algorithm will change soon.

alpha-carinae29 commented 4 years ago

If @alpha-carinae29 decides to use this method, The one thing that it may not be compatible with is the "bird's eye view", since it doesn't do any sort of screen space to world space transform. The calculations are entirely screen space based on the height of bounding boxes, but it seems to work pretty well and it's fast. That being said, speed is less important now for GstEngine since we do these calculations in a separate process. Even the existing code will work find if modified a bit.

Thanks for your activities in this issue. I think first of all we should create a metric and some ground truth so we can benchmark different methods. Deciding by just watching the result might not be a good idea. Do you have any idea how we can evaluate different methods?

mdegans commented 4 years ago

Ok. Thanks. I'll know not to work on it now @alpha-carinae29 . Please @ me when you commit so I can integrate the changes. For the moment I'm just going to avoid this part. I have my code spitting out the boxes, and trackign is already done interally to the Gstreamer pipeline. The only thing I'd ask is when you rework the code, to provide a way to account for objects that already have a uid assigned. I'd propose working with a dict of dicts with an object uid as integer key as a list of dicts with string keys and uid values might be slower lookup when I get back the scored results. I can work with anything, so long as each object has a uid somewhere, however.

alpha-carinae29 commented 4 years ago

Thanks for the suggestions, I think it will be ready in a week. I will mention you in the PR so we can talk about the implementations, data structures, etc.

mdegans commented 4 years ago

Do you have any idea how we can evaluate different methods?

Well, we have a test video, but we could use more from different angles. Not sure if there's an objective, automated, way of verifing accuracy unless you measure distances with a ruler irl and annotate the video manually to get ground truth.

Could possibly generate some ground truth synthetic footage of simple objects (eg. spheroids) moving around randomly in a 3d scene with the camera in a bunch of different perspectives. I know enough Blender to do that (Arts and Visual tech was actually my major in college, not software engineeing).

Or we could just be subjective about it. i.e, how close are these results, how fast is the technique, do we really need this feature, etc, and play it by ear.

mdegans commented 4 years ago

Thanks for the suggestions, I think it will be ready in a week. I will mention you in the PR so we can talk about the implementations, data structures, etc.

Sounds good @alpha-carinae29

mhejrati commented 4 years ago

@alpha-carinae29 thanks for taking this one. Can you please keep us updated with your progress here?

With respect to evaluation, I agree that we need a more objective and methodical. One possible approach is to try to calibrate the oxford towncenter camera's manually (if possible) and use that as ground truth (it is not 100 accurate but probably a lot better than what we have now). Also, we can do synthetic dataset as @mdegans suggested, to isolate the detector performance from the distance calculation performance. My vote is to do something quick and dirty with oxford towncenter and then get fancier :)

mhejrati commented 4 years ago

Hey, so I noticed non_max_suppression_fast is copypasta from this blog with the comments removed. I think we'll need a license to use it unless pyimagesearch has some page somewhere with a notice about all blog code that I can't find. I can reimplement it in another language (like Cython).

@alpha-carinae29 can you please make a PR to put a copyright text at the top of the NMS code? or if we are not using it anymore just removing it from the code?

mdegans commented 4 years ago

Hey, so I noticed non_max_suppression_fast is copypasta from this blog with the comments removed. I think we'll need a license to use it unless pyimagesearch has some page somewhere with a notice about all blog code that I can't find. I can reimplement it in another language (like Cython).

@alpha-carinae29 can you please make a PR to put a copyright text at the top of the NMS code? or if we are not using it anymore just removing it from the code?

I've done this in my branch here, so you can copy and paste (or whatever) @alpha-carinae29 . I added the comments back as well since they're useful to understand what's going on.