Noblis / INVSC-janice

An open API for computer vision algorithms
https://noblis.github.io/janice/
Other
9 stars 4 forks source link

janice_detect harness holds JaniceDetection objects #21

Closed taa01776 closed 6 years ago

taa01776 commented 6 years ago

Although the use of janice_detect_batch may lead to increased efficiency, it also leads to the use of a lot of memory. JaniceDetectionType objects need to somehow hold on to some form of the media behind them. Because the janice_detect harness calls janice_detect_batch on the entire input protocol file, all of the detections from the entire protocol are in memory at the same time, with their media. This could get problematic for large input sets.

JordanCheney commented 6 years ago

@taa01776 I can expose batch size as a parameter to detect. In practice, users will need to determine optimal batch sizes based on the algorithm, resources available etc. I'll add documentation for this in the harness as well.

carlosdcastillo commented 6 years ago

I guess @taa01776 is touching on a fundamental issue. And @JordanCheney as a user of the library is exercising it. Are detections supposed to be fast (and large) or succint (and slow). We could hold on to the file name (instead of the file contents if the second is true). See my comments on janice_io.

taa01776 commented 6 years ago

There are a few issues here:

JordanCheney commented 6 years ago

@carlosdcastillo to you're point on JaniceDetection being fast or succint, because the memory footprint of JaniceMediaIterator is not defined, and cannot be guaranteed to be small (see my first point) I would suggest being fast, with the important side-effect of you controlling the memory usage from that point forward.

carlosdcastillo commented 6 years ago

@JordanCheney Let’s think this through. Are you sure you want to serialize those detections? For CS5, for example a directory of serialized detections will be many tens of terabytes of uncompressed bytes representing every frame on which the detector was run on. To avoid this issue, people use JPG, and with JPG we’re taking about 88 pieces of 50GB (the total distribution size of CS5).

So long we’re all on the same page, I’ll write the code compile it and test serializing a couple of detections and wish everybody good luck using it.

JordanCheney commented 6 years ago

@carlosdcastillo - You're comment has led to a long internal discussion on our end. The requirement that detection store image information was a request from a commercial provider who had a computationally expensive preprocessing step before detection and enrollment. The idea was they could do the preprocessing before detection and then cache the result for enrollment. However, with updates to the API like janice_enroll_from_media there are mechanisms for doing this caching internally in the detection+enrollment case. For janice_enroll_from_detections we feel there is a strong assumption of a human in the loop, adjudicating multiple sightings of the same person to build a stronger template. Any added overhead of redoing operations on the media will be far smaller than time the human takes to do the adjudication.

Based on this, we've concluded that the requirement that detections store image information is overly constrained. I propose the following changes. JaniceDetection will still exist as an opaque type but will only be required to hold a JaniceTrack. Implementations can use a detection as an intermediate value cache if they would like. Detection can also hold optional metadata like gender and age if the implementation supports it. Those values can be queried with janice_detection_get_attribute. janice_enroll_from_detections and janice_enroll_from_detections_batch will be amended to include the relevant JaniceMediaIterators as an input parameter. Functions that previously returned JaniceTracks (janice_enroll_from_media, janice_cluster_media) will now return JaniceDetections.

If we are all amenable to these changes, I will push them onto the v6.0 branch for review.

carlosdcastillo commented 6 years ago

@JordanCheney This sounds good. The trade off we're making is that for a significant decrease (100 - 1000x) in memory footprint we're delegating to the user of the library the responsibility for sending in exactly the same media for detection and template computation. If they don't handle this responsibility well, they'll get garbage.

The library implementer may use MD5 or similar to verify that the image they got at detection time is the same image they're getting at feature computation time.

JordanCheney commented 6 years ago

This is addressed by 89e0143