Incorrect conversion to df for Google Video FRAME_MODE

adelavega commented 5 years ago

In FRAME_MODE GoogleVideoIntelligence returns results samped at 1hz (or so it seems, their docs don't say anything).

However, pliers is attempting to add durations to these events based on the distances between offsets.

Example, for chair, the raw results look something like:

{'entity': {'entityId': '/m/01mzpv',
       'description': 'chair',
       'languageCode': 'en-US'},
      'categoryEntities': [{'entityId': '/m/0c_jw',
        'description': 'furniture',
        'languageCode': 'en-US'}],
      'frames': [{'timeOffset': '55s', 'confidence': 0.4019864},
       {'timeOffset': '96s', 'confidence': 0.42338032},
       {'timeOffset': '97s', 'confidence': 0.6609389},
       {'timeOffset': '129s', 'confidence': 0.4277751},
       {'timeOffset': '156s', 'confidence': 0.6204254},
}

But the df looks like:

onset	duration	chair
55.0	41.00	0.401986
96.0	1.00	0.423380
97.0	32.00	0.660939
129.0	27.00	0.427775
156.0	1.00	0.620425

The durations should all be 1 in this case.

qmac commented 5 years ago

Makes sense 👍 , should be a semi-straightforward fix

adelavega commented 5 years ago

On this note, I was comparing the results of FRAME_MODE to using FrameSamplingFilter at 1hz and feeding it to VisionAPI. The results are basically about the same, except VideoIntelligence returns more features overall (probably different threshold).

So the main advantage is VideoIntelligence can be much faster (if you feed it a video file in a manageable codec / size).

adelavega commented 5 years ago

Actually, another advantage is that VideoIntellgience returns category entities for each tag. This could be really useful, as many categories are super specific, but we might want to analyze at a slightly broader level (e.g. furniture instead of chair). We don't seem to currently extract that information.

PsychoinformaticsLab / pliers

Incorrect conversion to df for Google Video FRAME_MODE #338