Luminarys / synapse

Synapse BitTorrent Daemon
https://synapse-bt.org
ISC License
855 stars 48 forks source link

Move to directory upon completion #150

Open Abraxos opened 4 years ago

Abraxos commented 4 years ago

Hello,

I'm really liking this, but there is lack of a single feature that is preventing me from using it for all my torrenting. I figured out how to set separate watch directories by simply running multiple instances and combining that with fswatch and in general I really like how this is done as a very nice, self-contained, module. Please keep up the good work.

The feature I would like to recommend is the ability to move torrents to a configured directory upon completion. I apologize if this is already in there, but I couldn't find it in the configuration example or the source code.

Thank you

Luminarys commented 4 years ago

It doesn't currently exist, there was some work being done in #127 but that didn't pan out. I really want to approach these sorts of problems in a modular way, I may do more investigation into how this can be done nicely.

piperun commented 4 years ago

I would really love it if this feature also add two things:

  1. is create folders based on some form of data (i.e. tracker, filetype, other data .torrent file provides) which the user can request via a seperate file (either like a router or event system).
  2. To have the ability to seed from these directories.
Luminarys commented 4 years ago

Just to clarify, you want to be able to specify some sort of rule system that based on torrent data downloads a file to a certain directory?

piperun commented 4 years ago

In a way yes, I was a bit hasty with my suggestion: long-term it would be nice to have some kind of file (JSON, toml, custom domain) that serves as a way to generate events (since I can't think of any other way that won't grind synapse to a pulp).

In the short term however the most important part would be to have the ability to allow the creation of custom folders that synapse will seed, where the folder name would be based on either metadata, custom label, or time-stamp.

Below is an example of how that file would look like (JSON):

{
   // In this case, represent the name of the event/rule, but theres other ways to seperate the events/rules
  "saveLinuxIsoFiles": { 
   reactOn: "tracker", // this would be any form of acceptable metadata tag
   seedFlag: True, // this would mean that synapse will seed even if it got a custom path,
   saveTo: "/home/user/mySecretLinuxIsoCollectionOnly",
   seedSpeed: "5mb",
   autolabel: False
   } 
}
qmega commented 4 years ago

Some other torrent clients can be configured to run a script/command on completion. That would provide a superset of this functionality and be useful for other things. It could just be an arbitrary executable that would get the ID of the relevant torrent through e.g. an environment variable.

Moving to a directory on completion is then a trivial shell script that calls sycli with a move command. A little bit of extra logic could choose the destination directory based on tracker, tags, or anything else.

The basic move to a completed directory might be a common enough use case to be worth its own built-in feature anyway. But there are lots of other use cases that wouldn't make as much sense built-in but could be quite nicely implemented as scripts. Some examples off the top of my head are:

The concept could be extended to events other than completion, with the event type given in another environment variable, for example. It could be argued that at some point you'd be better off with an RPC client, but for simple things a script could save a fair amount of effort and complexity.

piperun commented 4 years ago

Personally I think running a command (linux command) is prone to abuse ugly hacks for functionality as is at least the case with rtorrent.

With scripts wouldn't that essentially mean synapse would require to maintain it's own API to said script (if we're talking about using a general-purpose script language) since otherwise you could write a script to include synapse. It would be more preferable to allow synapse to set the rules of what and what isn't to be allowed,

Luminarys commented 4 years ago

I'm a bit unsure what the best way to approach this is, but I feel having some sort of configuration option which can execute arbitrary shell commands in response to certain RPC events isn't a bad idea.

Off the top of my head I think something like this could be ok:

[[scripting]]
# RPC criteria
criteria = { field = "torrent.progress", op = "==", value = "1.0" }
# List of fields to pass to the script in argv
fields = [
  "name",
  "size"
]
command = "echo $1 $2"

This is of course prone to all sorts of dangerous abuse, but I think it's ok to have some mechanism like this which gives users a lot of flexibility to automate workloads without needing to run an additional daemon

kpcyrd commented 4 years ago
command = "echo $1 $2"

I think this is a bad idea because it mixes trusted data (the configured command I want to execute) and potentially tainted data ($1 and $2) into a single string and passes it to a shell, which considers the whole thing trusted and tries to parse it into instructions to execute (possibly confusing untrusted data as instruction).

More reasonable would be something like:

command = ["echo", "some/${name}/path", "${size}"]

then only interpolate the variables and pass it directly to execve/std::process::Command, which forwards it to argv of the child process without doing any shell-trickery that could be exploited. Another approach (which is probably going to be less popular) is not supporting any variables in command = ["/path/script.sh"] at all and passing the data through environment variables instead.

With those caveats in mind I'd very much appreciate a way to configure commands to execute on events, in particular:

Luminarys commented 4 years ago

I agree security is definitely something to think about here. For the bit I wrote above I'm talking about $1 and $2 as referring to argv in bash, not doing any actual substitution by synapse. I wrote that up quickly though and didn't realize that this would require starting a subshell to forward the arguments, so I'd say your proposal is more along the lines of what we should do.

kpcyrd commented 4 years ago

Awesome! Stuff like that is one of my pet-peeves and it'd be great to have a torrent client getting this right, feel free to ping me when necessary! :)

Luminarys commented 4 years ago

Actually for a first iteration I do think setting environment variables is probably the simplest way to handle this, we can just set something like SY_ARG_0, etc.

I think it'd be good to get feedback from people who utilized this sort of scripting in rtorrent and other clients to see what'd be most useful/convenient here.

piperun commented 4 years ago

Just curious about my main question (since this request kinda went into another direction) about the ability to seed from multiple directories, is that something that can/would be implemented?

Luminarys commented 4 years ago

I believe this is already implemented. Torrents can be downloaded to arbitrary directories as well as moved between them via RPC calls which update the path field (and they remain seedable). Is this what you're looking for?

piperun commented 4 years ago

This does sound about what I'm looking for (is there any documentation/command/setting I can try this?), would this though work also with what the original issue (move on completion)?

Luminarys commented 4 years ago

You can see this specified in the RPC docs here and here. You can execute a move with sycli as seen here and here.

I'm imagining that with some scripting support implemented you could make a rule with a command like sycli torrent $SY_ARG1 move dest_dir on completion.

piperun commented 4 years ago

Ah thanks a lot!

The only worry I have with the script solution is that if you got a lot of different types of files downloaded/downloading, that it'll clog up the script file.

This is why I originally suggest some kind of data format to easier distinguish between different types of files and have synapse automatically handle them, since that way the script file can be used for more unique user situations.

Although I can understand if it's going to complicate the usability of synapse (since you would essentially have 2-3 files to keep track of).

evanrichter commented 4 years ago

external data that finds its way into script arguments is VERY dangerous for command injection, so it will have to be done correctly and need a very thorough review! I would even say its not possible to do safely in a generic way (see gtfobins for all the surprising ways you can abuse normal binaries to execute arbitrary commands). If there are specific actions like moving files, extracting, making directories, etc. then it should be possible to sanitize input according to the function.

one of the biggest reasons I use synapse is because of rust's memory safety/security guarantees which is very much needed in a service that interacts with untrusted data. I would not trust the directory names and file names specified by the torrent file either.

I expect to be able to safely download any torrent file regardless of client configuration, though obviously the torrent client isn't responsible after the files are downloaded

evanrichter commented 4 years ago

I think if this feature were added, and users start sharing configurations, and adding their own commands to run, they open themselves up to the kind of problem web browsers have where users are expected not to click "untrusted links"

kpcyrd commented 4 years ago

The command injection concern shouldn't apply here since the "command is a list instead of a string" suggestion fully bypasses any shell parsing. Sanitation (that word isn't properly defined though) shouldn't be necessary since the proposed interface is able to handle arbitrary utf-8 safely (with one caveat mentioned below). The remaining issues that we can't address with that:

Argument parsing confusion

Before passing arguments that are fully untrusted one should pass -- as an argument to disable further argument parsing so everything after it is passed as an opaque positional argument, otherwise the executed binary might do something unexpected if the filename starts with - or --.

Using regular shell scripting as an example, this might do something unexpected if the argument is user controlled:

./someprog --foo "$1"

This doesn't:

./someprog --foo -- "$1"

Directory traversal

Somebody could create a torrent with a name that contains /../, this is already something that needs to be handled by synapse regardless of the scripting interface, synapse should allow access to the normalized filename in case torrents with slashes in them aren't just rejected as invalid instead.

evanrichter commented 4 years ago

Good thought, but I still don't believe -- is a perfect solution. Again, it is program dependent, not a general solution. (this is because the string "--" will be given to the program executed to decide what to do with it, it's not a standard for exec or spawn or any syscall that spawns processes.

I would be much more comfortable if this functionality were limited to certain executables/actions that way mitigations can be certain and scoped correctly. If you want general scripting, that is what the RPC interface is for, and then the security risk is not assumed by synapse itself.

piperun commented 4 years ago

I would just like to remind @evanrichter & @kpcyrd that issue #198 is probably more suited for this discussion.

KristupasSavickas commented 4 years ago

IMO it would be better to pass all available metadata as environmental variables, they way it's done in udhcpc (busybox). That way we could only provide the name of the script to run and don't worry about spawning subshells, argument parsing/escaping and other issues that are bound to arise.

As for moving the torrent upon completion, how would synapse know the new location after running the action? If possible, it would be better to have a plugin system that hooks up into RPC mechanism that's already in place, so that communication both ways is possible.

KristupasSavickas commented 4 years ago

Adding to my comment above: I'm not sure, but it could be possible to read environment variables in synapse after the script finishes running if it exports changed variables. That way when the torrent gets moved by the scripts synapse would know its new location and could continue seeding.

evanrichter commented 4 years ago

we could only provide the name of the script to run and don't worry about spawning subshells, argument parsing/escaping and other issues that are bound to arise.

I agree this is a way to implement the feature, but it does not actually address the security issue.

it could be possible to read environment variables in synapse after the script finishes running

I think moving files could easily be a built in supported action and be handled much cleaner internally.