forensic-architecture / mtriage

framework to orchestrate the download and analysis of media
Other
98 stars 16 forks source link

Adding prostests analyser #149

Closed Smoltbob closed 4 years ago

Smoltbob commented 4 years ago

I added a folder and tried to replicate the structure of KerasPretrained. There is also a temporary script test.py that allows to test the model outside of mtriage.

Smoltbob commented 4 years ago

This is built using the trained model from the authors of the following repository. https://github.com/wondonghyeon/protest-detection-violence-estimation

breezykermo commented 4 years ago

Hi @Smoltbob, sorry for the delay. One basic update that is needed here to get the build passing is to lint the code using black. If you've pip installed the outer requirements.txt, there's a script to easily do this for all relevant files:

sh scripts/lint.sh
Smoltbob commented 4 years ago

Thank you,

breezykermo commented 4 years ago

Fantastic, thanks. I think this build may fail regardless due to something I need to fix in the core mtriage runtime. I'm going to post a long explanation here now about how you can test ProtestsPretrained on real data with a custom build.

breezykermo commented 4 years ago

Downloading some example data

I'm going to create an mtriage config that selects some video frames from Youtube, so that we can run the ProtestsPretrained analyser on them and simulate a real-world mtriage run. Running a generic search for "tear gas + mexico" in a span of 15 days gives me ~460 results, which running on my 4 CPU computer takes about 15 minutes to download using mtriage, and amounts to ~1.5gb of videos (mtriage downloads them all very low res by default).

If you don't have enough space or want to try on a smaller sample, just tweak the uploaded_after parameter to narrow down the range of dates.

I'm going to split my config into two parts, so that I don't have to redownload videos on every run while I'm debugging the ProtestsPretrained analyser.

Let's put this is 'data/download_vids.yml':

folder: media/test_protests
select:
  name: Youtube
  config:
    search_term: tear gas + mexico
    uploaded_before: "2018-11-30T00:00:00Z"
    uploaded_after: "2018-11-15T00:00:00Z"
analyse:
  - name: Frames

Make sure you've followed the youtube selector setup properly as well.

Now I can run this config using the regular build of mtriage (to do Youtube selecting and Frames) with the following command.

./mtriage run data/download_vids.yml --dev

This will download all the videos, and split them up into frames.

Testing ProtestsPretrained with a custom build

Create a 'whitelist.txt' file with the following lines:

ProtestsPretrained

This will allow us to build a stripped down version of mtriage with only the core library, and the dependencies for this component in development.

We can now build that with this command:

./mtriage dev build --tag protests --whitelist whitelist.txt

Now I have a build of mtriage with the ProtestsPretrained dependencies that I can experimentally run and debug.

When we run mtriage with this analyser, we want to use our custom build, to ensure that all of the dependencies specified in the ProtestsPretrained analyser are sufficient and correct:

Let's create a yml that operates on the data we downloaded in the previous step in 'data/test_protests.yml':

folder: media/test_protests
elements_in:
  - Youtube/Frames
analyse:
  - name: ProtestsPretrained
    config:
      labels:
        - protest

Now we can test a build of mtriage that only has the ProtestsPretrained requirements installed with the following command:

./mtriage run data/test_protests.yml --tag protests --dev
breezykermo commented 4 years ago

I've walked through this setup, and I think at least more dependencies- pandas and numpy- are required.

Smoltbob commented 4 years ago

Thank you ! Let's test all this then ! Yes there is a numpy dependency, I hadn't included it thinking it was standard. I'll track the pandas, I think I can remove this dependency.

Smoltbob commented 4 years ago

I've tested on a full run with no issues!

breezykermo commented 4 years ago

Awesome thanks. Sorry for the slow pickup here. It's in the works

breezykermo commented 4 years ago

First minor thing that is my fault: could you rebase these changes on top of the latest release? There's a small fix to the build process that I think you found your way around there.

breezykermo commented 4 years ago

Just running a clean build and set of tests locally to confirm this all works as expected, then will merge if all goes well. Thanks again!

breezykermo commented 4 years ago

@Smoltbob how long would you expect inference to take on a CPU? I'm currently testing with ~10,000 frames on Intel I7 and nothing is produced yet after 20 minutes of running.

Realising if inference for one frame takes closer to 0.5s than 0.1s this could be running for a while...

Smoltbob commented 4 years ago

Hi, so sorry for the delay! It takes 0.15sec on my i5-5200U, so that should be ok I think ? What I did notice when testing (including KerasPretrained) is that CPU in parallel doesn't seem to work. Could that be related ?

breezykermo commented 4 years ago

Ah I see. Yes I think you're right- I'll investigate the CPU in parallel problem.

breezykermo commented 4 years ago

Ah sorry- one other thing I just realised. Could you please PR to the 'release' branch? That way I can debug things before the changes flush through to master if something is up. Thanks so much, I'll merge as soon as the tests pass on the PR to release!