Closed kovalensky closed 1 year ago
Have you read the documentation on it? https://www.bittorrent.org/beps/bep_0030.html
I'm kinda lost in understanding of torrent.py's
The merkle roots are not calculated in torrent.py they are calculated in hasher.py. The entire version 2 implementation is contained in the the HasherV2
class.
def merkle_root(blocks: list) -> bytes:
"""
Calculate the merkle root for a seq of sha256 hash digests.
Parameters
----------
blocks : list
a sequence of sha256 layer hashes.
Returns
-------
bytes
the sha256 root hash of the merkle tree.
"""
if blocks:
while len(blocks) > 1:
blocks = [
sha256(x + y).digest() for x, y in zip(*[iter(blocks)] * 2)
]
return blocks[0]
return blocks
class HasherV2:
"""
Calculate the root hash and piece layers for file contents.
Iterates over 16KiB blocks of data from given file, hashes the data,
then creates a hash tree from the individual block hashes until size of
hashed data equals the piece-length. Then continues the hash tree until
root hash is calculated.
Parameters
----------
path : str
Path to file.
piece_length : int
Size of layer hashes pieces.
"""
def __init__(self, path: str, piece_length: int):
"""
Calculate and store hash information for specific file.
"""
self.path = path
self.root = None
self.piece_layer = None
self.layer_hashes = []
self.piece_length = piece_length
self.num_blocks = piece_length // BLOCK_SIZE
with open(self.path, "rb") as fd:
self.process_file(fd)
def process_file(self, fd: str):
"""
Calculate hashes over 16KiB chuncks of file content.
Parameters
----------
fd : TextIOWrapper
Opened file in read mode.
"""
while True:
blocks = []
leaf = bytearray(BLOCK_SIZE)
# generate leaves of merkle tree
for _ in range(self.num_blocks):
size = fd.readinto(leaf)
if not size:
break
blocks.append(sha256(leaf[:size]).digest())
# blocks is empty mean eof
if not blocks:
break
if len(blocks) != self.num_blocks:
# when size of file doesn't fill the last block
# when the file contains multiple pieces
remaining = self.num_blocks - len(blocks)
if not self.layer_hashes:
# when the there is only one block for file
power2 = next_power_2(len(blocks))
remaining = power2 - len(blocks)
# pad the the rest with zeroes to fill remaining space.
padding = [bytes(32) for _ in range(remaining)]
blocks.extend(padding)
# calculate the root hash for the merkle tree up to piece-length
layer_hash = merkle_root(blocks)
self.layer_hashes.append(layer_hash)
self._calculate_root()
def _calculate_root(self):
"""Calculate root hash for the target file."""
self.piece_layer = b"".join(self.layer_hashes)
hashes = len(self.layer_hashes)
if hashes > 1:
pow2 = next_power_2(hashes)
remainder = pow2 - hashes
pad_piece = [bytes(HASH_SIZE) for _ in range(self.num_blocks)]
for _ in range(remainder):
self.layer_hashes.append(merkle_root(pad_piece))
self.root = merkle_root(self.layer_hashes)
An oversimplified explanation is that you break the file into pieces just like you do in the version 1 protocol, but instead of using piece-length
blocks of the file you instead use 16KiB
blocks of the file. Then once you have the entire file broken into 16KiB blocks and their sha256
hashes, you iterate through them pairwise and hash their hashes together, which would leave you with exactly half the number of hashes. Then you apply the same process over again until you finally reach the root hash.
@alexpdev Thank you🎉, I managed to make it work in tmrr
@kovalensky Right on!
Hello @alexpdev again, hope you are doing great. I had a question, that went over my head for a couple of days now.
Can we validate root Merkle hashes for genuinity? The thing is, if we fetch torrent info metadata from peers, by some extent, they could send us fake metadata, with fake hashes. This could be useful for deceiving DHT search engines that show file hashes in web pages for example. And since clients can upgrade connections catching both formats, from v1 to v2 and vice versa from peers, and if clients have errors with validating root hashes of files using v2 connection, they will fallback to v1 (which could have completely different data than files we needed).
piece_layers dictionary in .torrent files have this format:
root_hash : layers_string
My idea was that if we could validate those "root_hash"es, calculating them from layers_string, we could reject fake torrents.
I tried to calculate root hash from layers_string using previous code, but didn't succeed, or maybe I'm doing something wrong?
I tried to calculate root hash from layers_string using previous code, but didn't succeed, or maybe I'm doing something wrong?
You must be doing something wrong because that is the only way in which to get the root hash to begin with.
from v1 to v2 and vice versa from peers, and if clients have errors with validating root hashes of files using v2 connection, they will fallback to v1 (which could have completely different data than files we needed).
THis isn't accurate, torrents exchanging data on v1 and v2 swarms simultaneously are required to describe the exact same data. That is why it is necessary to include b0
padded files in the pieces list in the info dictionary for hybrid torrent files.
I highly recommend reading the following links, You may have to read them more then once. I think I had to read the V2 BEP 12 times before I finally figured out what they were even talking about.
https://www.bittorrent.org/beps/bep_0005.html https://www.bittorrent.org/beps/bep_0030.html https://www.bittorrent.org/beps/bep_0051.html https://www.bittorrent.org/beps/bep_0052.html
I tried to calculate root hash from layers_string using previous code, but didn't succeed, or maybe I'm doing something wrong?
You must be doing something wrong because that is the only way in which to get the root hash to begin with.
Didn't see your answer notification, probably I tried to find pieces root based on string versions of piece layers, I had to do it with binary format and see if it works, will tell if it will work
I'm kinda lost in understanding of torrent.py's merkle root hash creation, I'm trying to write a function in php to calculate root hashes of files, I can create v1 implementations but still struggling with v2, could you please provide me a simple one block python code function to generate root hash from given file so I could inspect it and learn how to write something equal in php, I know it's too much to ask, but I'm just lost in hash functions, thanks.