Open annishaa88 opened 7 years ago
Hello @annishaa88;
Which detector algorithm are you using in this case, threshold, or content? Currently there is only support to ignore subsequent camera flashes within a certain window (using the minimum-scene-length argument), but the initial flash in the window will still be detected due to the design of the detection algorithms.
That being said, I can see this being a relatively common issue, so after the following release of PySceneDetect (v0.5, where the focus is major changes to the Python API), I will look into adding support for ignoring camera flash in certain videos. There are a few ideas I have that can solve this issue, but they will require some modification to the existing detection algorithms (as well as some additional command-line parameters), but are definitely possible, and should take care of almost all instances of flicker/camera flash.
If you have any suggestions regarding how the implementation should be, by all means, your comments would be most welcome. I will keep you posted as to my progress in this regard, and should hopefully have something for you to test after the next major release. Lastly, thank you very much for providing a sample video - this will be quite handy when the time for testing finally rolls around.
My apologies for the lack of update in regards to progress. Unfortunately my development efforts have been focused on the release of the new v0.5 API/CLI, and not so much on enhancements/features. I hope to have the new version released by the end of the month, at which time I can attempt to tackle this issue.
Just to confirm again @annishaa88, what command were you using to generate the results? The specific detection algorithm/thresholds being used would be very useful information.
Also one idea I just had to solve this is to add the ability to specify an intensity value which if the frame exceeds it, scene detection will be disabled. This might be rather easy to implement, so I will look into squeezing this in for the upcoming release.
Hi Breakthrough,
I have an example video that I have analyzed with bright flashes that cause erroneous scene breaks being detected. The music video for Growl by EXO is done in one continuous shot, but there are strobe lights that flash in the background. I analyzed the video using the content aware detector using the settings of threshold=30 and min_scene_len=10 and ended up with a total of 83 scenes being detected (link). I would expect at least a couple due to transitions to and from title cards, but the strobes account for the vast majority.
On a side note, I have updated to the newest version, and have been liking it so far. The new API has been working great.
Hi @wjs018;
Thank you very much for the extensive example - very well put together as well, might I add. Also thanks for your comments regarding the API, means a lot to me - if you have any improvements you want to suggest, feel free to bring them forwards.
I read through some of your work, and agree that edge detection is definitely a viable solution to the strobing issue. I'm looking into how I can create a new EdgeDetector class to detect scenes purely using edge detection, or possibly a more robust detector (RobustDetector?) that combines all features of the detectors (including slow fades and what not).
Also left a few suggestions for performance in one of your repos for how you might be able to improve your runtime using PySceneDetect - sorry the documentation is still under works, I need to add more examples of different usage styles, the current api_test.py is geared towards multiple calls to the function from starting/stopping the program entirely.
@Breakthrough
I have been doing some work on this problem recently and implemented a working example of an EdgeDetector
(code here). I tried to style it after the existing detectors in PySceneDetect
and it seems to be plug and play with my existing programs. I basically took the same approach that skvideo
does in their scenedet
function (docs) (github). This function is just an implementation of this paper (pdf). It adds a dependency on skvideo
for their motion estimation code, and a dependency on scipy
for some binary image morphology operations.
Some notes about edge detection:
skvideo
, so I ended up using the same method.r_dist
is the radius (in pixels) over which the detector will look for motion in the frame. However, this is pixels in the scaled down image if downscale_factor
is not 1 in the VideoManager
object.ContentDetector
with no downscaling. Using EdgeDetector
, I need to downscale by 4x in order to match that 20 fps.Results:
ContentDetector
:
threshold=30
, min_scene_len=10
)EdgeDetector
:
threshold=0.4
, min_scene_len=10
, r_dist=6
)Overall, I am happy with it for my purposes, but different videos are going to require parameter tuning to get better accuracy. There are certain videos for which I have found the ContentDetector
seems to perform better, while others perform better with the EdgeDetector
. I am going to experiment a bit with using both in combination. Perhaps to do something akin to the RobustDetector
you mentioned, would it make sense to add the ability to not add anything to a cut list unless all detectors added detect a cut for that frame? Currently, I believe a cut is added to the list if any of the detectors is triggered (which works great in some cases).
Hey, is there any progress on dealing with camera flash, or does anyone know any libraries that are able to deal with this?
Hey @dave-epstein;
Sorry, no progress yet on that front, I'd like to start cleaning up the backlog before addressing any new features at the present moment in time... My apologies, haven't had much time to keep up with the project lately.
I definitely do want to integrate this with PySceneDetect though. In the meantime, any pull requests are still most welcome.
Thank you.
Interestingly, it appears that a pretty novel solution using a lookahead buffer was implemented in rav1e AVC encoder (which itself was based on the detect-content
algorithm!):
https://github.com/xiph/rav1e/blob/master/src/scenechange/mod.rs
This indicates that an underlying design change will be required to support frame lookahead, but this seems like a viable (and awesome!) approach I never originally considered. Will definitely be looking more into how this can be integrated into PySceneDetect to allow for adding flash suppression to detect-content
.
I'll take a bite at this one for the v0.6.x release, as it's a really nice to have feature. I think I managed to come up with a method that doesn't require a lookahead buffer, and has minimal impact on performance. It doesn't use edge detection, but again the same method as rav1e
just modified to use a state machine rather than frame lookahead.
Edit: I don't want to ignore your other method either, @wjs018 - would be awesome if that could be either integrated with ContentDetector or shipped as part of PySceneDetect as another detection method. Just think I have a way to solve the most pressing use cases with minimal impact on performance (and a more "tunable" max # frames per flash setting).
I think my method may not perform as well as yours @wjs018, but will have minimal performance impact. Will likely be using yours as a source of test footage. It's probably worth shipping your edge detector with one of the next releases, as certain users will likely have the same use case as you did.
I did a test of this in v0.5.x (link to download .zip), for users wishing to beta test this feature before it's official release. It is turned on by default when running detect-content
with a suppression amount of 2 frames.
The suppression amount (called flicker_frames
) can be changed by the call to detect-content
via the -f
/ --flicker [N]
argument, which specifies the flash suppression amount, in frames, e.g.:
scenedetect -i video.mp4 detect-content -f 3
To turn flash suppression off, set -f
to 0. Any feedback is most welcome!
Looks like the v0.6.x branch and zip were deleted.
Sorry about that, you can find an updated link here with the feature: https://github.com/Breakthrough/PySceneDetect/archive/c46469e2bcceb8b33885a5bc2826c454a0ecba11.zip
I'll try to schedule this in for v0.6.1 or v0.6.2, but leaving it off by default until it has more testing.
Edit: If anyone can share any other examples to use as test cases, that would also be greatly appreciated.
I took a look at this again, and think there's a few main cases to deal with:
The video you posted @wjs018 falls under category 1. If you look at the delta in hue channel, it's very low throughout the video. I've been experimenting with combining the EdgeDetector implementation you provided with ContentDetector and have gotten promising results so far with adequate performance.
The idea is you can pass a set of weights for the deltas in hue, saturation, luma, and edges. This would allow both cases 1 and 2 to be dealt with by providing different sets of weights, or always considering only hue/edge information to provide higher confidence. Also need to look more into filtering the edges based on video resolution, but this is all doable.
Case 3 will need a separate approach, something like a low-pass filter on the luma channel. Going forwards I would like to integrate all of this into ContentDetector and provide options to control the mode/channel weights and filtering, rather than separate detectors. This should make it easier to try different combinations.
There definitely needs to be more work done in determining default weights, so the initial release will probably keep them the same as today (i.e. equal weights for HSL and zero weight on edges for now). Essentially, making ContentDetector much more robust by considering edge information and sudden brightness changes, and provides a pathway for improved detection confidence by cross-validating different metrics.
What do you think of doing auto tuning the thresholds to standard deviations throughout the video? Performance might be a bit worse. It tends not to catch pan shots combined with flash though. I had the same thought about combining the two with my own version of content aware detector. Not clean code, but the idea is to record diffs over the entire video. Assuming that most frames aren't cuts, a diff > 2 standard deviations works pretty well.
""" Experimental edge_detector module for PySceneDetect.
This module implements the EdgeDetector, which compares the difference
in edges between adjacent frames against a set threshold/score, which if
exceeded, triggers a scene cut.
"""
# Third-Party Library Imports
from os import link
from typing import Iterable, List, Tuple
import enum
from tkinter import W
from cv2 import imshow
import numpy
import cv2
# New dependencies
from skvideo.motion.gme import globalEdgeMotion
from scipy.ndimage.morphology import binary_dilation
# PySceneDetect Library Imports
from scenedetect.scene_detector import SceneDetector
def calculate_frame_score(current_frame_hsv: Iterable[numpy.ndarray],
last_frame_hsv: Iterable[numpy.ndarray]) -> Tuple[float]:
"""Calculates score between two adjacent frames in the HSV colourspace. Frames should be
split, e.g. cv2.split(cv2.cvtColor(frame_data, cv2.COLOR_BGR2HSV)).
Arguments:
curr_frame_hsv: Current frame.
last_frame_hsv: Previous frame.
Returns:
Tuple containing the average pixel change for each component as well as the average
across all components, e.g. (avg_h, avg_s, avg_v, avg_all).
"""
current_frame_hsv = [x.astype(numpy.int32) for x in current_frame_hsv]
last_frame_hsv = [x.astype(numpy.int32) for x in last_frame_hsv]
delta_hsv = [0, 0, 0, 0]
for i in range(3):
num_pixels = current_frame_hsv[i].shape[0] * \
current_frame_hsv[i].shape[1]
delta_hsv[i] = numpy.sum(
numpy.abs(current_frame_hsv[i] - last_frame_hsv[i])) / float(num_pixels)
delta_hsv[3] = sum(delta_hsv[0:3]) / 3.0
return tuple(delta_hsv)
sigma = 0.33
def unsharp_mask(img, blur_size=(21, 21), imgWeight=1.5, gaussianWeight=-0.5, retries=3):
if(retries == 0):
return img
gaussian = cv2.GaussianBlur(img, (5, 5), 0)
return unsharp_mask(cv2.addWeighted(img, imgWeight, gaussian, gaussianWeight, 0), blur_size=blur_size, imgWeight=imgWeight, gaussianWeight=gaussianWeight, retries=retries-1)
def compute_frame_transforms(frame: numpy.ndarray) -> float:
"""Computes the edge metrics for a frame."""
# Convert to grayscale
_bw = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
_hsv = cv2.split(cv2.cvtColor(frame, cv2.COLOR_BGR2HSV))
# Some calculation to determine canny thresholds
_median = numpy.median(_bw)
_low = int(max(0, (1.0 - sigma) * _median))
_high = int(min(255, (1.0 + sigma) * _median))
# Do our Canny edge detection
img_invert = cv2.bitwise_not(_bw)
img_smoothing = unsharp_mask(img_invert, (9, 9))
final = cv2.divide(_bw, 255 - img_smoothing, scale=255)
final = cv2.threshold(final, _low, _high, cv2.THRESH_BINARY_INV)[1]
_edges = cv2.Canny(final, _low, _high, apertureSize=3, L2gradient=True)
# cv2.imshow('final', final)
# cv2.imshow('edges', _edges)
# cv2.waitKey(1)
_contrast = _bw.std()
return (_edges, _contrast, _bw, _hsv)
class EdgeDetector(SceneDetector):
"""Detects cuts using changes in edges found using the Canny operator.
This detector uses edge information to detect scene transitions. The
threshold sets the fraction of detected edge pixels that can change from one
frame to the next in order to trigger a detected scene break. Images are
converted to grayscale in this detector, so color changes won't trigger
a scene break like with the ContentDetector.
Paper reference: http://www.cs.cornell.edu/~rdz/Papers/ZMM-MM95.pdf
"""
def __init__(self, similar=0.75, confirm=1.25, initial=0.3, r_dist=6, buffer_size=9):
super(EdgeDetector, self).__init__()
# first pass threshold
self.initial = initial
# similarity standard deviations threshold
self.similar = similar
# confirm standard deviations threshold
self.confirm = confirm
# distance over which motion is estimate (on scaled-down image)
self.r_dist = r_dist
self.last_frame = None
self.last_scene_cut = -3
self.contrasts = []
self.frame_scores = []
self.p_maxes = []
self.buffer = []
self.buffer_size = abs(buffer_size)
self.saved_frames = []
self.p_contrast_median = 0.0
self.p_max_median = 0.0
self.p_hue_median = 0.0
self.p_sat_median = 0.0
self.p_lum_median = 0.0
self.p_sum_median = 0.0
self.p_contrast_std = 0.0
self.p_max_std = 0.0
self.p_hue_std = 0.0
self.p_sat_std = 0.0
self.p_lum_std = 0.0
self.p_sum_std = 0.0
self._metric_keys = ['p_max', 'p_in', 'p_out', 'p_contrast', 'p_delta']
# self.cli_name = 'detect-content'
def _percentage_distance(self, frame_in, frame_out, r):
diamond = numpy.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
E_1 = binary_dilation(frame_in, structure=diamond, iterations=r)
E_2 = binary_dilation(frame_out, structure=diamond, iterations=r)
combo = numpy.float32(numpy.sum(E_1 & E_2))
total_1 = numpy.float32(numpy.sum(E_1))
return 1.0 - combo/total_1
def _compute_edges_p_max(self, last_edges, curr_edges):
# Estimate the motion in the frame using skvideo
r_dist = self.r_dist
disp = globalEdgeMotion(numpy.array(last_edges, dtype=bool),
numpy.array(curr_edges, dtype=bool),
r=r_dist,
method='hamming')
# Translate our current frame to line it up with previous frame
comp_edges = numpy.roll(curr_edges, disp[0], axis=0)
comp_edges = numpy.roll(comp_edges, disp[1], axis=1)
# Calculate fraction of edge pixels changing using scipy
r_iter = 6 # Number of morphological operations performed
p_in = self._percentage_distance(last_edges, comp_edges, r_iter)
p_out = self._percentage_distance(comp_edges, last_edges, r_iter)
p_max = numpy.max((p_in, p_out))
return p_max, p_in, p_out
def process_frame(self, frame_num, frame_img):
# type: (int, numpy.ndarray) -> List[int]
""" Detects difference in edges between frames. Slow transitions or
transitions that happen in color space that won't show in grayscale
won't trigger this detector.
Arguments:
frame_num (int): Frame number of frame that is being passed.
frame_img (Optional[int]): Decoded frame image (numpy.ndarray) to perform scene
detection on. Can be None *only* if the self.is_processing_required() method
(inhereted from the base SceneDetector class) returns True.
Returns:
List[int]: List of frames where scene cuts have been detected. There may be 0
or more frames in the list, and not necessarily the same as frame_num.
"""
metric_keys = self._metric_keys
_unused = ''
# If we're on the first frame, insert dummy values for delta and return
if(len(self.buffer) < 3):
curr_edges, curr_contrast, curr_bw, _hsv = compute_frame_transforms(
frame_img)
self.buffer.append(
(frame_num, curr_edges, curr_contrast, curr_bw, _hsv, {}))
p_contrast_pct = 0.0
return []
# Fraction of edge pixels changing in new frame, max, entering, and leaving
p_max, p_in, p_out = 0.0, 0.0, 0.0
if (self.stats_manager is not None and
self.stats_manager.metrics_exist(frame_num, metric_keys)):
p_max, p_in, p_out, p_contrast, p_delta = self.stats_manager.get_metrics(
frame_num, metric_keys)
else:
# Get last element of buffer
(last_frame_num, last_edges, last_contrast, last_bw,
last_hsv, last_diff) = self.buffer[-2]
# Get current frame transforms
(curr_edges, curr_contrast, curr_bw,
curr_hsv) = compute_frame_transforms(frame_img)
# Compute difference in frames
p_max, p_in, p_out = self._compute_edges_p_max(
last_edges, curr_edges)
p_delta = calculate_frame_score(curr_hsv, last_hsv)
p_contrast = abs(curr_contrast - last_contrast)
p_contrast_pct = 1
if (curr_contrast > 0 and last_contrast > 0):
p_contrast_pct = p_contrast/(numpy.min([curr_contrast, last_contrast]) /
numpy.max([curr_contrast, last_contrast]))
# record metrics
if self.stats_manager is not None:
self.stats_manager.set_metrics(frame_num, {
metric_keys[0]: p_max,
metric_keys[1]: p_in,
metric_keys[2]: p_out,
metric_keys[4]: p_delta,
metric_keys[3]: p_contrast
})
# save metrics for standard deviation calculations
self.contrasts.append(p_contrast)
self.frame_scores.append(p_delta)
self.p_maxes.append(p_max)
# save diffs between frames to avoid recalculating
curr_diff = {}
last_diff[last_frame_num] = curr_diff[frame_num] = (
p_max, p_delta, p_contrast)
self.buffer.append(
(frame_num, curr_edges, curr_contrast, curr_bw, curr_hsv, curr_diff))
# cv2.imshow("curr_bw", frame_img)
# cv2.imshow("last_bw", curr_edges)
# cv2.waitKey(1)
# if threshold is met mark for cut calculation
if p_max >= self.initial or p_contrast_pct >= self.initial:
# get last saved frame if there is one
last_buffer = None
if len(self.saved_frames) > 0:
last_saved_frame = self.saved_frames[-1]
last_buffer = last_saved_frame[1]
# if buffer is not the same as the last buffer we have our first potential cut in this buffer
# save the buffer
if last_buffer is not self.buffer:
self.saved_frames.append(
(frame_num, self.buffer))
# if buffer is the same as the last buffer, add this potential cut to the buffer and extend the life of this buffer
else:
# update saved frames with new frame_num and shift buffer
self.buffer = self.buffer[:]
self.saved_frames[-1] = (frame_num, self.buffer)
# if the buffer size is reached and we're not within the buffer size of the last saved frame, drop frames from the buffer
if len(self.saved_frames) == 0 or frame_num > self.saved_frames[-1][0] + self.buffer_size:
self.buffer = self.buffer[-self.buffer_size:]
return []
def get_or_create_diff_std(self, buffer_element, target_buffer_element):
(frame_num, _edges, _contrast, _bw, _hsv, diffs) = buffer_element
(frame_num_t, _edges_t, _contrast_t, _bw_t,
_hsv_t, diffs_t) = target_buffer_element
linked_diff = diffs_t.get(frame_num, None)
if linked_diff is None:
p_max, _p_in, _p_out = self._compute_edges_p_max(_edges_t, _edges)
p_delta = calculate_frame_score(_hsv_t, _hsv)
p_contrast = abs(_contrast-_contrast_t)
linked_diff = (p_max, p_delta, p_contrast)
diffs[frame_num_t] = linked_diff
diffs_t[frame_num] = linked_diff
(p_max, p_delta, p_contrast) = linked_diff
(p_hue, p_sat, p_lum, p_sum) = p_delta
(p_hue_std, p_sat_std, p_lum_std, p_sum_std) = (
abs(p_hue-self.p_hue_median)/self.p_hue_std,
abs(p_sat-self.p_sat_median)/self.p_sat_std,
abs(p_lum-self.p_lum_median)/self.p_lum_std,
abs(p_sum-self.p_sum_median)/self.p_sum_std
)
p_max_std = abs(p_max-self.p_max_median)/self.p_max_std
p_contrast_std = abs(
p_contrast-self.p_contrast_median) / self.p_contrast_std
return (p_max_std, p_contrast_std, p_hue_std, p_sat_std, p_lum_std, p_sum_std)
def confirm_cut(self, diff):
(p_max_std, p_contrast_std, p_hue_std,
p_sat_std, p_lum_std, p_sum_std) = diff
count = 0
if p_lum_std > self.confirm:
count += 2/3
if p_hue_std > self.confirm:
count += 1/3
if p_hue_std > self.confirm:
count += 1/3
if p_max_std > self.confirm:
count += 1
if p_contrast_std > self.confirm:
count += 1
if count >= 2:
return True
def confirm_similar(self, diff):
(p_max_std, p_contrast_std, p_hue_std,
p_sat_std, p_lum_std, p_sum_std) = diff
count = 0
if p_lum_std < self.similar:
count += 2/3
if p_hue_std < self.similar:
count += 1/3
if p_hue_std < self.similar:
count += 1/3
if p_max_std < self.similar:
count += 1
if p_contrast_std < self.similar:
count += 1
if count >= 2:
return True
def get_weighted_delta(self, diff_std):
p_max_std, p_contrast_std, p_hue_std, p_sat_std, p_lum_std, p_sum_std = diff_std
weights = [3, 3, 1, 1, 1, 1]
distributions = [p_max_std, p_contrast_std,
p_hue_std, p_sat_std, p_lum_std, p_sum_std]
weighted_sum = []
for std, weight in zip(distributions, weights):
weighted_sum.append(std*weight)
return numpy.sum(weighted_sum)/numpy.sum(weights)
def post_process(self, frame_num):
cut_list = []
(p_hues, p_sats, p_lums, p_sums) = zip(*self.frame_scores)
self.p_hue_median = numpy.nanmedian(p_hues)
self.p_sat_median = numpy.nanmedian(p_sats)
self.p_lum_median = numpy.nanmedian(p_lums)
self.p_sum_median = numpy.nanmedian(p_sums)
self.p_hue_std = numpy.nanstd(p_hues)
self.p_sat_std = numpy.nanstd(p_sats)
self.p_lum_std = numpy.nanstd(p_lums)
self.p_sum_std = numpy.nanstd(p_sums)
self.p_contrast_median = numpy.nanmedian(self.contrasts)
self.p_max_median = numpy.nanmedian(self.p_maxes)
self.p_contrast_std = numpy.nanstd(self.contrasts)
self.p_max_std = numpy.nanstd(self.p_maxes)
stats_manager = self.stats_manager
frame_updates = []
delta_rate = []
for saved_frames in self.saved_frames:
_frame_num, buffer = saved_frames
start_buffer = buffer[0]
end_buffer = buffer[-1]
first_frame_index = 0
second_frame_index = 1
while second_frame_index < len(buffer):
diff = self.get_or_create_diff_std(
buffer[first_frame_index], buffer[second_frame_index])
delta = self.get_weighted_delta(diff)
delta_rate.append((buffer[second_frame_index][0], delta))
if self.confirm_cut(diff):
frame_updates.append(
buffer[second_frame_index][0])
first_frame_index += 1
second_frame_index += 1
if len(frame_updates) == 0:
continue
last_frame_update = frame_updates[0]
grouped_frame_updates = [[last_frame_update]]
for frame_update in frame_updates[1:]:
if frame_update > last_frame_update + self.buffer_size:
# Initialize a new group
grouped_frame_updates.append([frame_update])
else:
grouped_frame_updates[-1].append(frame_update)
last_frame_update = frame_update
cut_list += self.get_cuts_from_buffer_group(
grouped_frame_updates, buffer)
# skip first cut because it will be tied to start delta
return cut_list[1:]
def get_cuts_from_buffer_group(self, grouped_frame_updates, buffer):
cut_list = []
for frame_updates in grouped_frame_updates:
first_frame_index = 0
last_frame_index = len(frame_updates) - 1
# find index of first frame in buffer
for frame_index in range(len(buffer)):
if buffer[frame_index][0] == frame_updates[0]:
first_frame_index = frame_index
if buffer[frame_index][0] == frame_updates[-1]:
last_frame_index = frame_index
second_frame_index = 0
first_frame_index -= 1
##cv2.imshow("curr_bw", buffer[first_frame_index][3])
##cv2.imshow("last_bw", buffer[last_frame_index][3])
# #cv2.waitKey(1000)
second_frame_index = 0
# for all frames between first and last frame_updates
frame_updates_passed = []
while second_frame_index < len(buffer)-1:
if 0 not in [len(frame_updates_passed), len(frame_updates)] and frame_updates[0] - frame_updates_passed[-1] <= 3:
frame_updates = frame_updates[1:]
second_frame_index += 1
if(buffer[second_frame_index][0] not in frame_updates):
continue
first_frame_index = second_frame_index - 1
second_frame_index += 1
found_similar = False
similarity = 0
start_second_index = second_frame_index
while buffer[first_frame_index][0] > frame_updates[0]-3 and first_frame_index >= 0:
while not found_similar and second_frame_index < len(buffer) and (second_frame_index - start_second_index) < 4:
if self.confirm_similar(
self.get_or_create_diff_std(
buffer[first_frame_index], buffer[second_frame_index]
)
):
found_similar = True
break
second_frame_index += 1
if found_similar:
break
first_frame_index -= 1
second_frame_index = start_second_index
if found_similar:
# filter out the frame_updates that between the start_index and forward_index
frame_updates_to_remove = []
for frame_update in frame_updates:
if frame_update >= buffer[first_frame_index][0] and frame_update <= buffer[second_frame_index][0]:
frame_updates_to_remove.append(frame_update)
# remove the frame_updates in frame_updates_to_remove
for frame_update in frame_updates_to_remove:
frame_updates.remove(frame_update)
else:
frame_updates_passed += [frame_updates[0]]
frame_updates = frame_updates[1:]
second_frame_index = 0
# add the first and last frame_updates not removed to the cut_list
if len(frame_updates_passed) > 0:
step = len(frame_updates_passed) - 1
if step == 0:
step = 1
cut_list += frame_updates_passed[::step]
return cut_list
@DrSammyD that should probably be a separate issue/feature request. Ideally I'd like to try and use some kind of online algorithm for estimating the threshold (with a reasonable buffer size), but using a sliding window might be sufficient for that purpose. That being said, it's a very good idea and should definitely be pursued in the right forum. Feel free to file a new bug report or create a discussion for that.
Does anyone have any links to videos they can share exhibiting this behavior? Youtube links are fine. I would like to start compiling a list of test cases to use for validating this, or at least better categorize the classes of issues that need to be solved for this.
I want to add back the strobe suppression by calculating the delta between the frame before the strobe/flash event, and up to N frames after. This means adding the following options:
However, I wonder if a simpler approach could be achieved by some kind of filter that rejects cuts if there is a sudden increase in average frame luma values (or it deviates based on a rolling average). Having a good library of test cases is crucial for implementing this, so hoping folks can provide some examples.
Ideally we could then create a single video with a bunch of different forms of flashes/strobes to help validate this.
To start things off, one interesting sequence is in the movie Kick-Ass (2010): https://www.youtube.com/watch?v=-SbnqIIkXQc
This has several different types of flashes/strobes throughout, and seems to be quite a challenging case.
I've made progress on this, and there will be a new flash filter that can be enabled/disabled in v0.7. When a rapid set of cuts is detected (i.e. several consecutive scenes less than the min-scene-len
option), subsequent cuts will be suppressed until min-scene-len
frames pass without a cut.
This effectively merges consecutive sequences of scenes shorter than min-scene-len
, grouping the areas where flashing occurs into a contiguous scene. On the above video with min scene length as the default (0.6 seconds):
Detector | # Scenes | # Scenes w/ Filter |
---|---|---|
detect-content | 123 | 74 |
detect-adaptive | 98* | 89** |
Here's an example of how the flash grouping looks in action:
https://github.com/Breakthrough/PySceneDetect/assets/125316/28d4a036-6b57-4a04-a152-4fecd783208e
Without the filter, detect-content
emits 10 different clips for this segment, and misses a cut that occurs just after the above video ends. Without the filter, detect-adaptive
performs better, emitting only 8 clips for this segment, and doesn't miss the subsequent cut. Just looking at the outputted thumbnails, there are lots of similar/duplicate images without the filter, and almost none with it. Flashes can be reduced further by adjusting min-scene-len accordingly with the filter enabled.
This doesn't affect any material without flashes, so I will enable this by default in the next release for both the API and the command line program. This won't resolve the flashing issue entirely, but will greatly reduce the impact it has on the output when it does occur.
The last things missing to close this out is:
HashDetector
HistogramDetector
There are some videos that have camera flashes in them
for example: http://assetsprod2-a.akamaihd.net/tag_reuters_com_2017_newsml_ov6kua1nj_reuters_ingest/tag_reuters_com_2017_newsml_ov6kua1nj_reuters_ingest_LOWRES.mp4 this video gives me the following result: ['00:00:09.766', '00:01:23.266', '00:01:35.066']
Can I do something about it?