Richienb / virusshare-hashes

Every single VirusShare MD5 hash in a single file.
MIT License
19 stars 6 forks source link

SHA1 and/or SHA256 hashes #8

Open storkinsj opened 4 years ago

storkinsj commented 4 years ago

The virusshare db also includes SHA1 and SHA256 but today they have only separated out MD5 hashes into the tables.

Modern tool sets are more likely to produce hash executables using SHA. If a tool proactively creates SHA hashes for running processes (I can think of several, it would be great to have a lookup function).

As we should know, there is no way to convert MD5 to SHA1 so this would involve combing the data out of the "other" source of information (terabytes of virus database information).

It is possible today to use REST to do this (search for hits based on sha256 hash), but it seems the goal of the torrent files is to avoid unnecessary bandwidth and I think it's a good goal.

Richienb commented 4 years ago

The only major problem I see here is downloading and then processing multi-gigabyte torrents with no more than 4 MB/s download speed (as shown in the tracker) which would take way too long. For now, we can only really use md5 unless we had access to a faster method.

storkinsj commented 4 years ago

Hi Richie- I'm going to download the payloads. I am working on one first to see what it looks like. This will take a long time and require lots of bandwidth and storage. I mentioned it to the site admin. A good goal to limit the amount of hits to their web site. It may just be time to add the SHA1 and SHA256 text files for them, which would make this job much easier.

storkinsj commented 4 years ago

So I got one and the format seems to be: Name: Numbercode+md5 hash Data: Anything

There is no additional data in the payload as there is on the web site.

Looks like (at best) We could calc SHA1 and SHA256 tables but no metadata unless we want to analyze the files.

Actually the "file " command in unix does some analysis but payloads that are html files masquerading as GIF files will not be detected correctly.

Richienb commented 4 years ago

Since we can write this in Python or Node, calculating the hash shouldn't be the problem. The only issue is the zip sizes since I don't think build services like Travis CI allow for such large size. We need to somehow download only a few files for each zip at a time without exceeding the size limit.

storkinsj commented 4 years ago

So it looks like for SHA1 or SHA256 we would modify https://github.com/Richienb/virusshare-hashes/blob/master/generate.py to curl down the virus payloads (15gb or so each one) one at a time and then enumarate and hash each payload. I have not used Travis. What is the limit? Will one file be too big?

GitHub
Richienb/virusshare-hashes
Every single VirusShare MD5 hash in a single file. - Richienb/virusshare-hashes
Richienb commented 4 years ago

@storkinsj When building on Windows, there's about 40GB of free disk space. However, that's still not large enough to store some of the zips.

storkinsj commented 4 years ago

Ok thanks. It does seem like the only sane way to do this is to wait until they provide the same information online for other hashes that they do for MD5.