Open Benji377 opened 3 months ago
@GamingGuy003 does this roughly sum it up?
Sounds about right. This will presumably greatly increase the time scanning takes, so we might have to come up with something in regards to that (Threading?? / making fuzzy hashing optional if you just want a quick scan?)
Threading might be a good idea, but we might need to scale it in relation to the user's resources. Also maybe adding a switch on the frontend to choose between signature scanning and fuzzy scanning might be useful
Threading with a dynamically scaled threadpool shouldnt be a problem. The toggle makes sense
Is your feature request related to a problem? Please describe. Currently Raspirus is highly dependant on MD5 signatures. If there is a virus whose signature we don't have, Raspirus has no way to know it's a virus. Even if we create a massive database with all possible MD5 signatures and always keep it up to date, an attacker could still just add a white-space to the file and completely change the MD5 signature.
Describe the solution you'd like It would be great to have a system that tells us how likely a file is a malware. Ideally, it should be lightweight and fast. That's where fuzzy hashing comes in play, it creates a hash of a given file, just like MD5, but with the added benefit that we can compare one hash to the other. Implementing this would give us the ability to compare a given file to a database of known-malware signatures and return a percentage of how similar a file is. Then we add a threshold and everything above that threshold is considered malware, everything below is considered safe.
Describe alternatives you've considered
Additional context The current issue is gathering the fuzzy hashes, this might take a while. And even then, we would still need to keep the database up to date and reformat the backend. We might allow the user to choose between MD5 signatures (Fast, higher coverage, higher miss-rate) and Fuzzy hashing (Lower coverage due to missing samples, lower miss-rate and more accurate analysis)