bittorrent / bittorrent.org

394 stars 101 forks source link

'pieces root' of empty files (libtorrent vs. reference v2 torrent creator) #120

Open pobrn opened 3 years ago

pobrn commented 3 years ago

BEP 52 writes:

pieces root For non-empty files this is the the root hash of [...]

In my interpretation this means that for empty files (length = 0), this field may be omitted. The reference implementation matches my interpretation:

if hashes.length == 0:
    return {b'': {b'length': hashes.length}}
else:
    return {b'': {b'length': hashes.length, b'pieces root': hashes.root}}

Yet, libtorrent does not seem to accept such metainfo files (dump_torrent example says ERROR: a v2 file entry has no root hash); and generates sixteen zero bytes for empty files as pieces root (make_torrent example).

Which one is the correct behaviour?

ssiloti commented 3 years ago

The reference implementation is correct. Please file an issue with libtorrent.

arvidn commented 3 years ago

a related issue that came up is that the exact algorithm for adding pad-files is not specified in BEP52, but it's important when creating hybrid torrents.

At least the way libtorrent works is, when parsing a hybrid torrent, it builds up the file layout for a the v2 torrent while parsing the files tree, it then parses the v1 files and assumes pad files will be added the same way they are added in the v2 parsing. If they end up with different file layouts, the torrent is rejected.

Algorithm

As far as I can tell, there are two main approaches:

  1. Pad files are added in-front of files, if the current size is not aligned
  2. Pad files are added after files, if the current size is not aligned (to prepare for the next file to be aligned)

(2) will end up with "tail-padding", where the last file of the torrent is a pad file, which is redundant. I noticed the reference torrent creator does this. libtorrent does (1) and won't add tail-padding.

Since libtorrent has already been released with behavior (1), I think going forward I will just have to make the torrent loading accept both. i.e. consider the v1 files compatible both with and without tail padding.

Empty files

The other aspect this affects is whether empty files have padding in-front of them or after them. This is a bit trickier to support both of, since it affects the file indices. Although, it doesn't affect piece indices, or the piece -> file mapping, so maybe it's not unreasonable to be forgiving on this choice either.

@the8472 @ssiloti do you have opinions? should this be specified or should it be considered "quality of implementation" how to interpret valid hybrid torrents?

arvidn commented 3 years ago

BEP52 says "Padding files are synthetic files inserted into the file list to let the following file start at a piece boundary." which does suggest approach (2).

arvidn commented 3 years ago

if the last file is an empty file, there won't be any tail-padding with algorithm (2).

the8472 commented 3 years ago

Why would empty files need any padding? They don't change the alignment.

arvidn commented 3 years ago

they don't. The reference implementation adds padding to make empty files aligned to pieces too, which I believe is unnecessary.

e.g.

file1
empty1
empty2

are padded like:

file1
.pad
empty1
empty2

where technically there doesn't need to be a pad file at all

arvidn commented 3 years ago

I don't think it matters that much exactly how the padding is applied, but I hadn't appreciated that it's kind of important to agree on the details to make hybrid torrents have their v1 file layout match the v2 layout.

pobrn commented 3 years ago

As another data point BiglyBT creates hybrid torrents so that every file starts at a piece boundary, which means it adds padding before empty files, but it omits padding after the last file.

So

file1
empy1
empty2

is encoded as

file1
<padding>
empty1
empty2

and

file1
empty1
empty2
file2

is encoded as

file1
<padding>
empty1
empty2
file2

.