Enhance reliability with fuzzy hashing

Benji377 commented 3 months ago

Is your feature request related to a problem? Please describe. Currently Raspirus is highly dependant on MD5 signatures. If there is a virus whose signature we don't have, Raspirus has no way to know it's a virus. Even if we create a massive database with all possible MD5 signatures and always keep it up to date, an attacker could still just add a white-space to the file and completely change the MD5 signature.

Describe the solution you'd like It would be great to have a system that tells us how likely a file is a malware. Ideally, it should be lightweight and fast. That's where fuzzy hashing comes in play, it creates a hash of a given file, just like MD5, but with the added benefit that we can compare one hash to the other. Implementing this would give us the ability to compare a given file to a database of known-malware signatures and return a percentage of how similar a file is. Then we add a threshold and everything above that threshold is considered malware, everything below is considered safe.

Describe alternatives you've considered

A machine learning algorithm - Too slow and unpredictable. Also hard to implement with the current setup
Yara signatures - Resource intensive, would drop support for Single board computers and lower-end PCs
File analysis - Too slow, would require opening each file and "look at it"

Additional context The current issue is gathering the fuzzy hashes, this might take a while. And even then, we would still need to keep the database up to date and reformat the backend. We might allow the user to choose between MD5 signatures (Fast, higher coverage, higher miss-rate) and Fuzzy hashing (Lower coverage due to missing samples, lower miss-rate and more accurate analysis)

Benji377 commented 3 months ago

@GamingGuy003 does this roughly sum it up?

GamingGuy003 commented 3 months ago

Sounds about right. This will presumably greatly increase the time scanning takes, so we might have to come up with something in regards to that (Threading?? / making fuzzy hashing optional if you just want a quick scan?)

Benji377 commented 2 months ago

Threading might be a good idea, but we might need to scale it in relation to the user's resources. Also maybe adding a switch on the frontend to choose between signature scanning and fuzzy scanning might be useful

GamingGuy003 commented 1 month ago

Threading with a dynamically scaled threadpool shouldnt be a problem. The toggle makes sense

Raspirus / Raspirus

Enhance reliability with fuzzy hashing #691