Enable server use as a library

joonas-fi commented 2 years ago

Example user application

This works:

package main

import (
    "context"
    "log"
    "net"
    "os/signal"
    "syscall"

    "github.com/gokrazy/rsync/rsyncd"
)

func main() {
    ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
    defer cancel()

    if err := logic(ctx); err != nil {
        log.Fatal(err)
    }
}

func logic(ctx context.Context) error {
    listener, err := net.Listen("tcp", "0.0.0.0:666")
    if err != nil {
        return err
    }

    rsyncServer, err := rsyncd.NewServer(rsyncd.Module{
        Name: "default",
        Path: "/tmp/rsyncd",
    })
    if err != nil {
        return err
    }

    if err := rsyncServer.Serve(ctx, listener); err != nil {
        return err
    }

    log.Println("gracefully exiting")

    return nil
}

Notes on this PR

The PR might be easiest to be reviewed by just reading one commit at a time
I changed a few log messages to be more user friendly, but those commits can be easily removed if you don't agree with them
Tests pass, at least on my machine :laughing:
I did a blanket internal -> pkg, which might not be a good idea (I wanted an easy change to get myself started). But the "user code" needs at least config + rsync packages. I can add a commit that makes not-necessary-to-be-public packages back to internal.
The most questionable change I did was change the module map concept to a list. It brought quite a few changes. But I think if the rsyncd internally wants a map, it could take a simple list of modules and internally transform it into a map.
- Despite this bringing many changes, I think end result is simpler
- Configuration doesn't have to compute the map from list of modules
rsyncd package forcefully logs to stderr. I would like to have Logger *log.Logger field in rsyncd.Server which NewServer() maybe sets to log.Default() by default (preserving the current behaviour) or has to be given explicitly (have not thought of this yet), so user can give discarding logger to make the library quiet. But that could be a new PR, there is quite a bunch of stuff here already.

General notes

With my fresh eyes on this codebase, a few notes:

"maincmd" as a concept was confusing to me. I knew there is "client" and "server". I was thinking "which one main refers to?".
There is name "receiver" which currently means the "client" portion. But that name might not be long-lived if both the client and the server both gain send/receive capabilities. Would name change to client/server remove these ambiquities? Something like:
- maincmd = servermaincmd
- receivermaincmd = clientmaincmd
- rsyncd could also be renamed to rsyncserver, but that's probably not important, the "d" already communicates the server portion..

stapelberg commented 2 years ago

Thanks for sending this!

The PR might be easiest to be reviewed by just reading one commit at a time

Actually, it might be easiest to file a new issue and then send small individual PRs for each logical change, with each PR referencing the issue so that we have a good paper trail :)

That way, we can merge things piece by piece instead of having to get everything ready at once.

I changed a few log messages to be more user friendly, but those commits can be easily removed if you don't agree with them

Thanks, the log message changes are fine. These would be good to send as a separate PR, for example.

I did a blanket internal -> pkg, which might not be a good idea (I wanted an easy change to get myself started). But the "user code" needs at least config + rsync packages. I can add a commit that makes not-necessary-to-be-public packages back to internal.

I don’t like having pkg as part of the import path at all, so we’d instead need to move packages from github.com/gokrazy/rsync/internal to github.com/gokrazy/rsync (no pkg subdirectory).

But, as you already wrote, this blanket change is probably going too far. I think we should carefully consider each type we export, even if it’s more work.

The most questionable change I did was change the module map concept to a list. It brought quite a few changes. But I think if the rsyncd internally wants a map, it could take a simple list of modules and internally transform it into a map. Despite this bringing many changes, I think end result is simpler

Sounds good! Can you send this as a separate PR, too, please?

rsyncd package forcefully logs to stderr. I would like to have Logger *log.Logger field in rsyncd.Server which NewServer() maybe sets to log.Default() by default (preserving the current behaviour) or has to be given explicitly (have not thought of this yet), so user can give discarding logger to make the library quiet. But that could be a new PR, there is quite a bunch of stuff here already.

Sounds good as well!

"maincmd" as a concept was confusing to me. I knew there is "client" and "server". I was thinking "which one main refers to?". There is name "receiver" which currently means the "client" portion. But that name might not be long-lived if both the client and the server both gain send/receive capabilities. Would name change to client/server remove these ambiquities?

Ah, this is quite a confusing situation when talking about rsync.

The rsync protocol can run either in “sender” or in “receiver” mode.

However, both client and server can fulfill either role! It depends on the flags with which they were started.

For example, you could have a client connect to a server and then act as a sender to upload a backup of your PC (client) to your network storage (server). Or, you could have a client connect to a server and then act as a receiver to download a Linux distribution’s package repository.

I agree that currently, the terminology used in the repository is not entirely clear. But, the desired state is that the terms sender, receiver, client and server are all used, and refer to different, orthogonal concepts (rsync protocol role vs. network connection role).

The maincmd package fulfills the same role as the upstream rsync implementation: one binary which can run in all the different modes.

joonas-fi commented 2 years ago

Thanks for the great feedback!

Thanks for explaining the rsync sender/receiver/client/server difficulty. Yeah that was new info to me, I stll need to digest it a bit lol :smile: But yeah maincmd might not be misleading as a name then, just a natural rsync all-in-one concept then, if I understood properly. :slightly_smiling_face:

I've extracted different logical changes to own PRs and created an issue to serve as the paper trail for the high-level work.

Still TODO here: internal -> pkg rename.

In my project's I've been following the pkg/ approach. I however appreciate the fact that this is not my project. :)

How should we continue with the packages that need to be public? rsyncd & config have been identified. Would this work:

move <root>/internal/rsyncd -> <root>/rsyncd
move <root>/internal/config -> <root>/config

?

One thing I'm hesitant about the naming, since config currently seems to mean "rsyncd config", will there ever be "rsync client config"? Or should they all live in the same package, so "config" suffices for them both?

stapelberg commented 2 years ago

Thanks for sending the PRs!

Thanks for explaining the rsync sender/receiver/client/server difficulty. Yeah that was new info to me, I stll need to digest it a bit lol. But yeah maincmd might not be misleading as a name then, just a natural rsync all-in-one concept then, if I understood properly.

You’re welcome! Maybe you could add a package-level comment that explains this while you’re at it? Otherwise, I can add one later.

Still TODO here: internal -> pkg rename.

In my project's I've been following the pkg/ approach. I however appreciate the fact that this is not my project. :)

I don’t have strong opinions on this, but found the goal of keeping your domain types in the top-level attractive. E.g. rsync.SumHead reads very nicely, and the import path github.com/gokrazy/rsync is short and descriptive. When using this approach, it seems natural to me to use cmd for binaries and internal for internal code. Everything else is public.

I found https://github.com/golang-standards/project-layout/issues/10 to be a good read in case you haven’t seen that yet.

How should we continue with the packages that need to be public? rsyncd & config have been identified. Would this work:

move <root>/internal/rsyncd -> <root>/rsyncd

Yes

move <root>/internal/config -> <root>/config ?

One thing I'm hesitant about the naming, since config currently seems to mean "rsyncd config", will there ever be "rsync client config"? Or should they all live in the same package, so "config" suffices for them both?

Yeah, let’s rename it to rsyncdconfig, as that’s what it is. There might well be a different config at a later point, which should probably be similarly named (not config, which is too generic).

joonas-fi commented 2 years ago

On pkg/

I found https://github.com/golang-standards/project-layout/issues/10 to be a good read in case you haven’t seen that yet.

I haven't read that yet, thank you! :+1: Many good points for and against. My biggest "cmd/ internal/ pkg/* are good" observation is that it very clearly separates Go code from "other" files, such as documentation, "misc files" (be it project logo etc.) but most importantly, separate subsystems written in other languages. I have repos where I have TypeScript-based frontend for a Go backend, so I very much like keeping the things in separate hierarchies. :) I appreciate that Go-only projects don't need this, but I prefer consistency (Go-only projects don't look different to Go + frontend language projects).

Of course one could have different repositories for these subsystems, but monorepo vs. multi-repo is a different topic and not without obvious caveats: added complexity, atomic commits not possible (or as easy) as in a monorepo etc.. But I'm going off-topic now..

It's good to get different perspective! I need to continue evaluating my stance on pkg/ :)

Open questions

I now moved internal/rsyncd -> rsyncd and internal/config -> rsyncdconfig.

Now looking at it, I'm not happy with it. Thoughts:

Should rsyncdconfig actually be under rsyncd/rsyncdconfig, as it's conceptually purely a concept under rsyncd?
Also related to that question, another question is if it actually needs to be a separate package. I.e. what if config.go was just part of the rsyncd package itself? If you look at the use example config.NewModule(), the rsyncd.Server is not ever usable without first importing the config package.
Also there's now the problem of rsyncdconfig config file having listeners declared, but the listener concepts HTTPMonitoring & AnonSSH are not handled by the package rsyncd, even though the new package name rsyncdconfig implies so.
- Should we rename rsyncdconfig -> maincmdconfig to signify that that is what it configures?
- The above idea has a caveat as well: in library use it looks weird to use rsyncd.NewServer() and give it modules from maincmdconfig package
- Is it just best to keep calling it rsyncdconfig and accept the fact there are concepts the rsyncd package doesn't actually use

joonas-fi commented 2 years ago

I clearly have no idea what I'm doing with Git. I got merge conflicts where I didn't think there should be, and had to do an additional merge. I can clean up the history later when this looks ready for merging. Perhaps best to just look at overall changes, not the noisy commits for now..

stapelberg commented 2 years ago

I haven't read that yet, thank you! +1 Many good points for and against. My biggest "cmd/ internal/ pkg/* are good" observation is that it very clearly separates Go code from "other" files, such as documentation, "misc files" (be it project logo etc.) but most importantly, separate subsystems written in other languages.

For this, I have come to prefer directories whose name starts with an _, e.g. in distri I’m using a _build subdirectory. The Go toolchain entirely ignores these :)

Also related to that question, another question is if it actually needs to be a separate package. I.e. what if config.go was just part of the rsyncd package itself? If you look at the use example config.NewModule(), the rsyncd.Server is not ever usable without first importing the config package.

Yeah, I think making it part of the rsyncd package is a good idea indeed.

I clearly have no idea what I'm doing with Git. I got merge conflicts where I didn't think there should be, and had to do an additional merge. I can clean up the history later when this looks ready for merging. Perhaps best to just look at overall changes, not the noisy commits for now..

Have you tried rebasing your changes? Alternatively just copy the files, throw away history and commit fresh :)

joonas-fi commented 2 years ago

Sorry it took me so long to get back to this. This has taken more time than I had allocated (not your fault!), so I had to find some more free time :)

joonas-fi commented 2 years ago

(The PR message's example user code is up-to-date now)

joonas-fi commented 2 years ago

I submitted the fixes!

For some reason I can't respond to all of your comments, so here goes:

If you have time for the internal/config → internal/rsyncdconfig change as well, I’ll gladly take that as a separate pull request later :)

Yeah I will send it after this has been merged!

Fair point. Let’s use []Module here, also because a zero-module server is not generally useful, so having rsyncd.NewServer() fail at compile-time is actually desired.

Sidenote: you can also use func NewServer(module0 Module, modules ...Modules) to enforce "at least 1" at compile time, but that's pretty ugly :laughing: And that complicates the case when you simply have a slice, you've to give it as NewServer(modules[0], modules[1:]...)

stapelberg commented 2 years ago

Thanks a lot for contributing this!

gokrazy / rsync