bellingcat / open-source-research-notebooks

Jupyter notebooks helping open source researchers, journalists, and fact-checkers use command line tools and code projects for digital investigations.
MIT License
219 stars 18 forks source link

Notebook: deepface #5

Closed msramalho closed 10 months ago

msramalho commented 1 year ago

Tool: https://github.com/serengil/deepface Goals:

The tool has a python client but also a command line interface (see the bottom of the readme), ideally the CLI should be given priority but if there are limitations python should be used. Optionally both: if there's an educational value to that.

amithr commented 1 year ago

Hi Miguel,

Based on my research, In order to analyze a folder of images using DeepFace, they each need to be vectorized and then pickled. Then, we can compare a single image against this collection of pickled images (a database?)

The process is significantly more complex than what we've included in Notebooks before. If you access this link, the process is outlined in the section "The hacker's way."

Is this still something we should include?

Or, alternatively, is there a more straightforward way of doing this?

msramalho commented 1 year ago

Hi Amith,

Indeed this will be a more technically challenge one.

Based on this readme code (similarity section) I assumed the library could take care of that, I imagine it may be slower the first time around if it vectorizes things, but this would prevent the need to have all that code in the notebook, assuming it works:

#face recognition
dfs = DeepFace.find(img_path = "img1.jpg", 
          db_path = "C:/workspace/my_db", 
          distance_metric = metrics[2]
)

At some point it may reach a case where the code is not easy for someone who does not understand it, but it they can still specify the input parameters then it will still be impactful.

A possible path to explore here is having a section that describes how to use google drive connection

amithr commented 1 year ago

Ok, I understand.

Based on the article I read, I made the incorrect assumption that all the images would have to be vectorized and stored in a database manually.

msramalho commented 1 year ago

I'm glad there's an easier way to it!

amithr commented 1 year ago

Just a quick update: I had to take a quick break to get started on a freelance project I'm working on. I'll be back to working on this and I hope to get the deepface Notebook done by this next weekend.

msramalho commented 1 year ago

Thanks for letting me know!

amithr commented 1 year ago

Hi Miguel,

I don't have a lot of experience in Binder, but is there any way to host and run a Jupyter Notebook without it already being in a Github repository? Or is that kind of the whole point of Binder?

msramalho commented 1 year ago

I think that's the point of it, they have a few alternatives but all in the same "link your public code" mindset image

amithr commented 1 year ago

That being said, would it be necessary to include instructions for uploading to Binder? I suppose I could see users using the notebook directly from this repo (once it's public). However, the instructions for uploading to Google Colab were fairly long and it might be a little inefficient for half the notebook to be instructions for uploading files.

amithr commented 1 year ago

Hi Miguel,

I'm kind of at a stage where I'm not really sure what else to add, particularly to the FAQ - I could really use some feedback on the notebook as a whole. Here's a link.

msramalho commented 1 year ago

Hi Amith,

I've had a look and it's looking good, I feel you should provide with some example files for people to easily download, I can host them on google drive. I've also added a "research note" about the limitations of using face comparison methods blindly. There's an empty Binder section, is that intentional?

A suggestion, though it's specific for colab, is showing how to easily receive locally uploaded files: from google.colab import files

uploaded = files.upload()

# this part being optional, maybe just calling `files.upload()` is enough.
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

in the FAQ maybe point to a tool or two where people can convert images from one format to jpg for example. one such tool is ffmpeg, we could do a notebook about that one too since I've found it to be useful for converting files, extracting frames etc. :smile: ffmpeg -i input.tif output.jpg (but let's hold on that too)

amithr commented 1 year ago

Hi Miguel,

Thanks for the feedback. I'll address all of this.

Do you think that the whole section explaining how to upload images to Google Drive is necessary? I'm not really sure why I did it that when it seems that users could simply upload their images locally (within the Notebook's own allocated storage).

msramalho commented 1 year ago

That's a fair point that also eluded me, also learning! If this way works across notebook environments then it's preferable.

msramalho commented 1 year ago

It seems I didn't have permission to modify the notebook so my changes didn't apply.

I added

- **Important research note**: Comparing faces is a a task that leads to false positives: face of individual X matches Y even though they are different people, so it should only be used to verify cases where there's additional reasons to believe that X and Y are the same person, so take it with a grain of salt and not apply the tool blindly. You can also use other tools to validate a positive result, such as [AWS Rekogintion](https://aws.amazon.com/getting-started/hands-on/detect-analyze-compare-faces-rekognition/). 

under the limitations

amithr commented 1 year ago

I added your text and also changed the limitations so that you should have edit permissions.

amithr commented 1 year ago

Hi Miguel,

I completed the following tasks:

amithr commented 1 year ago

Do you think there are any other significant changes to make?

msramalho commented 1 year ago

Hi Amith sorry for the delay in feedback, the notebook looks good to commit. Thanks for including both upload options. I think, depending on where people run the notebook, there will always be some way or another of achieving the file upload.