Open GoogleCodeExporter opened 8 years ago
"Implement BEP 38" could really mean many things, but I agree it is a valuable
Bittorrent Enhancement Proposal.
Not mentioned in the BEP, I think, is that some torrent info sections include
MD5 (yes, MD5) hashes of individual _files_. Not standard, and I'm not sure
which programs use that extension. But when included, it is a brilliant way to
confirm with certainty (particularly when coupled with the hashtable) that
files match in entirety between very different torrents.
The BEP recommends not limiting to whole file matches, but to recognize
matching data piecemeal (e.g. media files with same payload but different
headers).
The BEP discusses something a bit broader than its title ("Finding Local Data
Via Torrent File Hints") suggests. It also is associating content in multiple
torrent files, which --if there are sufficient clues to do so in some cases
without complete local data -- you have linked torrents together in
sophisticated and/or simple ways that allow you to both download and seed these
linked files from multiple torrent swarms. This could help retire the problem
of dead torrents in the future.
One "torrent file hint" that should not be overlooked, are hints initiated by
the user. I, for example, can often find multiple torrents that include some
or all of the same files through research/searches involving data outside of
the torrent files themselves. I then know/expect that certain files are the
same between the torrents. Currently it is possible to download those torrents
to the same save directory and rename identical files in the various torrents
all to the same name, and manually juggle those torrents for downloading
purposes (selecting different files for download from each torrent, will for
example still achieve a level of parrallelization in downloading). Downloading
requires a lot of manual intervention in this case currently though and does
not utilize all the swarms concurrently for the same files. If one swarm is
too slow or doesn't have complete sources, the user can manually re-juggle
which files are selected for which torrents. When all files are downloaded,
the user would have to pause everything, select all the files in all the
torrents, recheck all the torrents, and then resume all the torrents in order
to seed to everybody BEP-38 style. Anyway, there should be an option to not
just automate the BEP 38 stuff, but to explicitly link files (and parts of
files) between torrents.
At the libtorrent level, this could be implemented strictly by exposing APIs
for linking data between different torrents, and handling downloads and seeding
appropriately for the linked torrents, but leaving the discovery of linkages up
to torrent application developers to implement.
One exciting thing for me is that this not only improves the Bittorrent
network, but creates an opening, in some cases, for multi-protocol clients to
identify data they need that is available on networks other than Bittorent as
well.
Original comment by a...@lovetour.info
on 11 May 2013 at 11:55
> But when included, it is a brilliant way to confirm with certainty
> (particularly when coupled with the hashtable) that files match
> in entirety between very different torrents.
You can't know for sure until you've downloaded the file and compared the
actual data. There's nothing stopping a malicious torrent creator to put
invalid md5 sums in the .torrent file.
They could however be used as hints, just like the other hints.
If the md5 of file A matches an existing file B, the hashes for the pieces in
B, adjusted to match the piece alignment in A, could be computed and compared
to those in A's torrent file.
Original comment by arvid.no...@gmail.com
on 11 May 2013 at 9:36
That's true.
For small files near the piece size in size, a false MD5 could go undetected
when computing the realignment. As the difference between the file size and
piece size increases, the odds that a collision could be intentionally
created with a different hashtable but the same MD5 become very small,
making the MD5 in those cases a very strong hint. In any case, the data
would still be verified according to the torrent used to download it, and
then only pieces, after realignment, that compute to the same hash in the
other torrent should be shared. If the result is many "hashfails" for the
realigned data, the source could be discarded.
It would probably be desirable for a particular file source (torrent) to be
considered authoritative/trusted if the object is to find multiple sources
for that file. I imagine each new torrent started by the user or specified
in a general purpose RSS-feed/otherwise scheduled (not necessarily the FIRST
torrent downloaded that hinted at having the same files, but each new
download is authoritative for itself unless the user specifies that it is a
backup source for a previously started/downloaded torrent(s)).
If the authority/trust is equal, e.g.: Start a download of Torrent 1 with
files A,B; without any special instructions from the user start a download
of Torrent 2 which appears to have files A,C -- both have equal trust as
there is no information otherwise. In those cases, all the files should be
created, but hints should still be used to attempt to share any matching
file/parts of files between the torrents and minimize data
downloaded/multi-source. The pieces will still have to match their
respective torrent's hashtables to be kept (although the moment at which
this is known is different from an isolated/na�ve download, assuming
realignment is necessary).
Original comment by a...@lovetour.info
on 12 May 2013 at 12:45
Original issue reported on code.google.com by
v.korkod...@gmail.com
on 18 Mar 2013 at 6:57