drakkan / sftpgo

Full-featured and highly configurable SFTP, HTTP/S, FTP/S and WebDAV server - S3, Google Cloud Storage, Azure Blob
https://sftpgo.com
GNU Affero General Public License v3.0
9.28k stars 726 forks source link

Consider making the sdk package a separate module #657

Closed sagikazarmark closed 2 years ago

sagikazarmark commented 2 years ago

It's a common practice to make packages that are supposed to be consumed by third party code (eg. plugins) separate modules. (Example: api packages in most Hashicorp software).

This would make the SDK package lighter in terms of dependencies (it doesn't need all deps of SFTPGo).

At the same time, until plugins are experimental, it might be easier to keep it in the same module (although making an existing package a module later has it's own quirks).

drakkan commented 2 years ago

Yes, this is the intended behaviour, I have to find the time to move some code around to make it happen.

drakkan commented 2 years ago

I made a quick attempt:

0001-WIP-sdk-separate-package.zip

The sdk does not import the kms package anymore but json deserialization fails because kms Secrets are now an interface and not a struct. After removing kms imports we should remove logger imports (this should be easier) to have a separate package.

For now I give up, I have other priorities and the Go compiler is smart enough to only include the real imported packages in the plugin binaries and not all the sftpgo deps.

drakkan commented 2 years ago

This is basically done with the latest commit. As expected there is no change in the plugins binary size (go.sum is lighter).

My first idea was to move the sdk package to github.com/sftpgo/sdk but if you look at vault they use a different approach

https://github.com/hashicorp/vault

for example sdk is a separate module but it lives in the same repo

https://github.com/hashicorp/vault/tree/main/sdk

They set tags such as sdk/v0.3.0. google-cloud-go does the same. So probably I have to leave the sdk package in this repo. Do you have experience with this? Thanks

sagikazarmark commented 2 years ago

Using the sdk/v0.3.0 style (which is basically a "submodule" of the main module) is not really encouraged by Go unless absolutely necessary. Still, it works quite well in situations when you want to provide a client/SDK for your software and some of the code is shared between the two.

It makes evolution of the two modules easier (eg. breaking changes are easier this way).

This is what I meant when I originally suggested to make it a separate module.

If necessary/feasible, it can always be moved out to a separate repo as well later.

drakkan commented 2 years ago

Thanks, I have come to the same conclusion, there is some shared code and it will remain so in the future, eg. the user structure is defined in the sdk package and then used in sftpgo as well. I'll be using the same repo for now

drakkan commented 2 years ago

uhm ...there are issues importing the module from external projects and the Docker build fails. I'll revert for now. Maybe I have to add a tag, no more time for today

sagikazarmark commented 2 years ago

Maybe I have to add a tag, no more time for today

You either have to tag or ensure to use the latest commit. Making an existing package a submodule is technically not backwards compatible, because go sometimes works with several versions of the same package at the same time (eg. pulled in as transitive dependency) and gets confused.

The solution is to tag a version and make sure that nothing imports the old version of the package.

sagikazarmark commented 2 years ago

eg. the user structure is defined in the sdk package and then used in sftpgo as well

On a side note: unless it's necessary to use the same type from the same package (eg. in the plugin interface that plugins implement and the server uses), it might be better to duplicate structs and keep them in sync. Typically, structs that get encoded and transmitted that way between callers and the service can be duplicated.

Bottom line is: if you don't need to import a specific type from a specific package, it doesn't necessarily have to be shared code.

drakkan commented 2 years ago

The sdk package contains now only structs definition and the kms base implementation. This is required because the current kms plugins work this way:

If a master key is provided we first encrypt the plaintext data using the SFTPGo local provider and then we encrypt the resulting payload using the Cloud provider and store this ciphertext.

Maybe I can remove something from the sdk/kms package too.

Regarding the structs duplication I'm trying to minimize them, see for example here: VirtualFolders and FsConfig are duplicated to avoid to import the vfs package within the sdk.

I want to think a bit more about this but I'll probably make a separate package, for example github.com/sftpgo/sdk this seems cleaner and I want to use a different license for the sdk (LGPL or Apache) to allow users to develop proprietary plugins. Having multiple licenses in the same repo could be confusing

sagikazarmark commented 2 years ago

For what it's worth I tested the new SDK with the latest build of SFTPGo and a rewritten version of a hook into a plugin: works like a charm.

drakkan commented 2 years ago

Thanks for the confirmation, the SDK is now what it was originally intended. Sometime I need an open issue to find enough motivations to complete a task :). I'm a little lazy I suppose ...

As for plugins there are two more things in my TODO for a long time:

1) understand if it is possible to have a single executable that implement multiple plugin interfaces. For example a single executable that implement both the eventsearcher and the notifiers plugins. I've done some quick tests in the past and this doesn't seem possible, but my tests haven't been accurate enough to be absolutely sure 2) explore the best way to support io.Reader/Writer in plugins (GRPCBroker usage I think). This is required to expose vfs implementations as plugins. However browsing the go-plugins issues it seems there is a size limit for the trasmitted data that could be an issue for this usage

If you have time/interest it would be really helpful if you could help with these tasks, thanks

sagikazarmark commented 2 years ago

Sometime I need an open issue to find enough motivations to complete a task :). I'm a little lazy I suppose ...

There is nothing wrong with that. :)

understand if it is possible to have a single executable that implement multiple plugin interfaces.

Good question. Not sure if hashicorp ever does that. It sounds like a nice to have.

This is required to expose vfs implementations as plugins

I'm not sure I'd do that or at least not for the majority of implementations, purely for performance reasons. Hooks that are async are not that important to be "fast", even pre-hooks can have a little bit of penalty, because they are called once.

I can see why extensibility could be important though. One thing that comes to mind is that the S3 API is kinda standard these days. A bunch of different storage solutions implement it to make integrations easier, so if someone needs a really custom storage solution integrated, chances are the S3 API will work for them.

Honestly, until there isn't any use case for supporting more file systems, I'm not sure I'd put efforts into generalizing the file system support.

If you want to open separate issues for these topics, we can probably discuss them more in-depth there and it'd be easier to track.

I'll definitely play a little with plugins, so chances are I will have some time looking into the first one.

drakkan commented 2 years ago

Sometime I need an open issue to find enough motivations to complete a task :). I'm a little lazy I suppose ...

There is nothing wrong with that. :)

understand if it is possible to have a single executable that implement multiple plugin interfaces.

Good question. Not sure if hashicorp ever does that. It sounds like a nice to have.

This is required to expose vfs implementations as plugins

I'm not sure I'd do that or at least not for the majority of implementations, purely for performance reasons. Hooks that are async are not that important to be "fast", even pre-hooks can have a little bit of penalty, because they are called once.

I can see why extensibility could be important though. One thing that comes to mind is that the S3 API is kinda standard these days. A bunch of different storage solutions implement it to make integrations easier, so if someone needs a really custom storage solution integrated, chances are the S3 API will work for them.

Honestly, until there isn't any use case for supporting more file systems, I'm not sure I'd put efforts into generalizing the file system support.

If you look at the SFTPGo forks you can see things like these

https://github.com/mever/sftpgo/blob/cli/vfs/clifs.go https://github.com/Smithx10/sftpgo/blob/add_manta/vfs/manta.go

some users asked for OneDrive support etc.. I don't want to add all the possible storage providers in the SFTPGo core, I think the current vfs are enough.

This is a very low priority task for me and if I ever add a vfs plugin I have to make sure it works for all use cases and with acceptable performance

If you want to open separate issues for these topics, we can probably discuss them more in-depth there and it'd be easier to track.

I'll definitely play a little with plugins, so chances are I will have some time looking into the first one.

great, thanks

sagikazarmark commented 2 years ago

Yeah, I'm not saying it shouldn't be done one way or another, I'm saying the current providers should not be converted into plugins.

One obvious solution for people is to keep forking and adding their own solutions. That will always be the most performant, so if a plugin based solution ever gets implemented, forking should probably be kept around and documented as the best solution from a performance perspective.

As for adding plugin support for other vfs solutions: maybe it doesn't have to be the same plugin based solution, but a simplified, custom protocol that's easy enough to implement.

An HTTP based solution is probably a good example, although it might still not be the most performant one. But users could easily just implement an interface, run their own "proxy" service in front of the storage that implements sftpgo's internal protocol.

Might be a huge venture though and I'm not sure it's worth the effort.

drakkan commented 2 years ago

Yes, the current providers will remain builtin, plugins will be eventually used for providers that I don't want to add directly to SFTPGo.

An HTTP/WebSocket based solution is exactly what I had in my mind, but not sure if I'll ever implement it.

It mostly depends on whether someone wants to sponsor such a development.

sagikazarmark commented 2 years ago

It mostly depends on whether someone wants to sponsor such a development.

Yeah, that sounds like a good approach. Maybe you could set up a project board for ideas like this: features that are not likely to land, unless someone sponsors it. Enterprise oriented features, like custom VFS, auditing, etc.

sagikazarmark commented 2 years ago

@drakkan any plans for tagging a new release? Anything I can do to help?

drakkan commented 2 years ago

I want to ship the UI improvement before v2.3 so I'll backport them and the sdk patch in v2.2.x branch and these changes will be included in v2.2.2. However making a new release still requires several time consuming manual steps (aws marketplace package, ubuntu ppa packages etc.).

I'm looking to finalize a major sponsorship these days and I'm focused on that. I think it will take a few weeks before the next update, sorry

sagikazarmark commented 2 years ago

That's absolutely fine. Having an estimated timeline is more than enough for me to plan my next steps. Thanks!

sagikazarmark commented 2 years ago

Also, congrats on the sponsorship! Good to hear news like this in times like this.

drakkan commented 2 years ago

Also, congrats on the sponsorship! Good to hear news like this in times like this.

Thanks, I just backported the related changes to 2.2.x branch, let's see if some bug will be reported in the next weeks

sagikazarmark commented 2 years ago

Hm, that branch doesn't publish a Docker image, does it?

drakkan commented 2 years ago

Hm, that branch doesn't publish a Docker image, does it?

no, it only runs CI

drakkan commented 2 years ago

I enabled docker on that branch https://github.com/drakkan/sftpgo/commit/8cd9e886f3dc3f22c1290201439384007210d08f

if this does not work can you please provide a patch? We should have the branch name in the resulting image, thank you

sagikazarmark commented 2 years ago

Looks great, thanks!

sagikazarmark commented 2 years ago

Friendly ping: is there a release coming our way? I've been running the current version in a dev environment and it works great.

drakkan commented 2 years ago

Maybe the next weekend, ideally I would like to see this merged, otherwise I have to use a replace. Currently SFTPGo always open a socket connection even if logging to journald is not enabled (it can be enabled only for the subsystem mode)

sagikazarmark commented 2 years ago

Thanks for the update!

drakkan commented 2 years ago

Hi,

please give a try to the latest update (once CI ends), there are a lot of dependency updates. A notable update for your use case is the go storage library to v1.19. The current version will be likely tagged as v2.2.0 this weekend. Thank you