effigies / BitTornado

UNMAINTAINED - John Hoffman's fork of the original bittorrent
Other
148 stars 45 forks source link

Create torrents from a list of files #18

Closed samhocevar closed 8 years ago

samhocevar commented 8 years ago

Currently BTTree can be built to hold either a single file, or a full directory with all files and subdirectories. I would like to create torrent files from a specific list of files I have built through other means, and I am wondering which of the following options you may find the more acceptable:

Note that I will be adding this feature and using it in production for several years whatever happens, so it’s up to you to decide whether you want it available for others :)

effigies commented 8 years ago

This sounds like a reasonable idea to me. I'm a little hesitant to put too much logic into BTTree, just to avoid confusion. But it may be that we want to rearchitect, either by subclassing BTTree or by lifting some of its logic into another class and subclassing BTTree from that.

Can you describe your situation a little more fully? As I'm reading you, it sounds like you have a number of files in a directory structure, and want to create one metafile for each of those files. Is this correct? If so, do you want to create metafiles next to each one, or in a new directory, or in a tree mirroring the original directory structure?

samhocevar commented 8 years ago

I’d like to create a single metafile that holds a collection of files.

Here is one of my use cases. I have compiled several programs and want to distribute them using BitTorrent. This would be my directory structure:

./
./Binaries/
./Binaries/Win64/
./Binaries/Win64/Program.exe
./Binaries/Win64/Program.pdb
./Binaries/Win64/OtherProgram.exe
./Binaries/Win64/OtherProgram.pdb

In this case I would like to create a single metafile containing only Binaries/Win64/Program.exe and Binaries/Win64/OtherProgram.exe but not the .pdb files. I do not wish to remove the .pdb files or move the .exe files to a temporary directory because other tasks running at the same time are using them. I also do not wish to copy the .exe files to a temporary directory structure because this will cause slowdowns (we’ll be dealing with tens of gigabytes of data).

I hope this makes more sense!

effigies commented 8 years ago

Okay, that makes sense. I'm actually a little surprised that it's not the default behavior of btmakemetafile, but apparently that's to create a single metafile per input file.

Since this is still a recursive-descent thing, one option would be to modify BTTree.__init__ with a predicate so that files were included or excluded based on some criteria.

e.g.

bttree = BTTree('.', [], lambda x: x.endswith('.exe'))

or

bttree = BTTree('.', [], lambda x: not x.endswith('.pdb'))

Another option would be to simply take a list, in which case you might do

class BTList(object):
    def __init__(self, paths):
         ...

class BTTree(BTList):
    ....

And move everything but the tree-specific logic into BTList. I think I'd be okay with either of these. If you have another suggestion that makes more sense, I'd be glad to hear it.

And whether you end up modifying btmakemetafile or creating a new program to actually run this, I'd be happy to include that. (If it changes btmakemetafile too much, I might change the name, so people can expect the usual behavior from the original.)

samhocevar commented 8 years ago

I am not fond of the predicate solution because it will still walk the whole directory structure, which may be huge. I'd rather have the user provide their own list.

I have put my proof of concept here: https://gist.github.com/samhocevar/e392363909e4561ec5b7 . With this, make_meta_file(loc, ...) can be just implemented as make_meta_file_from_list([loc], ...).

It's not working properly yet because it defines the root directory as ., which is forbidden by BitTornado. Maybe I need to add a fake root directory for the torrent, I don't know yet.

effigies commented 8 years ago

. should never be a path element, since a file can be encoded just as its own name (e.g. ./file.ext becomes ['file.ext']), and directories are not encoded in an Info dictionary, except as paths before files. So you should just be able to drop any initial .s from paths. That is, your list above:

./
./Binaries/
./Binaries/Win64/
./Binaries/Win64/Program.exe
./Binaries/Win64/OtherProgram.exe

Would be encoded as:

['Binaries', 'Win64', 'Program.exe']
['Binaries', 'Win64', 'OtherProgram.exe']

If you want to construct a directory structure, you could add zero-length files called something like .empty in the leaves, but that's the closest thing to an explicitly encoded directory I can think of.

effigies commented 8 years ago

Also, thanks for the gist. If you want to try refactoring BTTree to make the flat list be a base class, and the BTTree be a subclass that walks a directory to create a list, that's cool. It should make your function much more straightforward. But if you want to stick with what you have, that's fine, too.

Whatever you decide, feel free to make a PR and I'll iterate through it with you so other people can benefit from your work.

samhocevar commented 8 years ago

Oh, silly me, I didn’t realise having a full BTTree hierarchy was not necessary, thanks!

In that case, I have this much, much simpler implementation:

def make_meta_file_from_list(name, loc_list, url, params=None, flag=None,
                             progress=lambda x: None, progress_percent=True):
    """Make a single .torrent file for a given list of items"""

    def splitpath(path):
        d, f = os.path.split(path)
        return splitpath(d) + [f] if d else [f]

    tree_list = [BTTree(f, splitpath(f)) for f in loc_list]

    info = Info(name, sum(tree.size for tree in tree_list),
                flag=flag, progress=progress,
                progress_percent=progress_percent, **params)
    for tree in tree_list:
        tree.addFileToInfos((info,))

    newparams = { key:val for key, val in params.items() \
                  if key in MetaInfo.typemap }

    metainfo = MetaInfo(announce=url, info=info, **newparams)
    metainfo.write(params['target'])
effigies commented 8 years ago

Oh good. I'm glad I didn't make things too complicated with BTTree. The whole point of it was to be able to make a directory structure of .torrent files in one pass, rather than reading each file a bunch of times. (It was for an aborted project that would allow people to sync subsets of a directory using bittorrent.)

effigies commented 8 years ago

Closing because you've found a solution. If you want to add your function or do a refactor, open a PR with a proposal.