Open storkinsj opened 4 years ago
The only major problem I see here is downloading and then processing multi-gigabyte torrents with no more than 4 MB/s download speed (as shown in the tracker) which would take way too long. For now, we can only really use md5 unless we had access to a faster method.
Hi Richie- I'm going to download the payloads. I am working on one first to see what it looks like. This will take a long time and require lots of bandwidth and storage. I mentioned it to the site admin. A good goal to limit the amount of hits to their web site. It may just be time to add the SHA1 and SHA256 text files for them, which would make this job much easier.
So I got one and the format seems to be: Name: Numbercode+md5 hash Data: Anything
There is no additional data in the payload as there is on the web site.
Looks like (at best) We could calc SHA1 and SHA256 tables but no metadata unless we want to analyze the files.
Actually the "file
Since we can write this in Python or Node, calculating the hash shouldn't be the problem. The only issue is the zip sizes since I don't think build services like Travis CI allow for such large size. We need to somehow download only a few files for each zip at a time without exceeding the size limit.
So it looks like for SHA1 or SHA256 we would modify https://github.com/Richienb/virusshare-hashes/blob/master/generate.py to curl down the virus payloads (15gb or so each one) one at a time and then enumarate and hash each payload. I have not used Travis. What is the limit? Will one file be too big?
GitHubEvery single VirusShare MD5 hash in a single file. - Richienb/virusshare-hashes
@storkinsj When building on Windows, there's about 40GB of free disk space. However, that's still not large enough to store some of the zips.
Ok thanks. It does seem like the only sane way to do this is to wait until they provide the same information online for other hashes that they do for MD5.
The virusshare db also includes SHA1 and SHA256 but today they have only separated out MD5 hashes into the tables.
Modern tool sets are more likely to produce hash executables using SHA. If a tool proactively creates SHA hashes for running processes (I can think of several, it would be great to have a lookup function).
As we should know, there is no way to convert MD5 to SHA1 so this would involve combing the data out of the "other" source of information (terabytes of virus database information).
It is possible today to use REST to do this (search for hits based on sha256 hash), but it seems the goal of the torrent files is to avoid unnecessary bandwidth and I think it's a good goal.