datalad / datalad-ria

Adds functionality for RIA stores to DataLad
http://datalad.org
Other
0 stars 1 forks source link

Restricted shell for RIA access #37

Open matrss opened 1 year ago

matrss commented 1 year ago

Considering a storage server dedicated to hosting DataLad datasets, it might be desirable to restrict user logins in a way that they only have push/pull capabilities to a RIA store. Plain git and git-annex have something like this in the form of git-shell and git-annex-shell. Is there any way to do this with DataLad and RIA stores?

Another option might be to use SSH chroot jails to confine users somewhat, although that's not very straightforward to set up.

bpoldrack commented 1 year ago

@matrss: There's nothing like git-(annex)-shell. So far, in our use cases we simply rely on user permissions and have (for example) a dedicated group for a given store.

Also note, that for read access the store can also be served via http(s). datalad clone and the special remote can deal with that. Instead of giving ria+ssh://..../path/to/store#<id> to clone, one can then simply give ria+https://<wherever you serve the store's root>#<id>.

matrss commented 1 year ago

Alright, Thanks! My intention was to provide a storage server which would not allow interactive shell access and arbitrary commands, but instead just push and pull capabilities to a RIA store according to its file permissions. I do not see how a dedicated group might help with that, apart from the file permissions in the store of course.

From what I could see in the source code, datalad actually connects via a full-blown shell and issues commands there, so this is most likely not trivial to implement. Though, maybe the things datalad does to push/pull to a RIA store might be possible via sftp only?

Anyway, I will probably use something else instead of a RIA store, like gin/gogs, also for its user-friendliness.

If you think this is a feature that would be worthwhile feel free to keep this open, otherwise I'd be fine with closing this.

bpoldrack commented 1 year ago

@matrss

Though, maybe the things datalad does to push/pull to a RIA store might be possible via sftp only?

Partially. It would lack one feature: One can have the entire git-annex object tree in a single (uncompressed) 7z in RIA stores in order to mitigate possibly inode limitations (we can therefore represent any dataset with 20 inodes). Via shell we can actually run 7z on the remote end and get the extracted key only. I don't see how to achieve that via SFTP only.

However, apart from rewriting this code since it became quite a mess, I do consider having a config that switches to SFTP only (and consequently lack that 7z feature). You have a valid use case and apart from that, there may be sysadmins who are not happy with us "sneaking" into shell session traffic at all. (Originally that shell approach was introduced to mitigate latency issues with separate ssh calls)

So, I'll leave this open, but it may move soon-ish into a dedicated repo for everything RIA related.

matrss commented 1 year ago

Via shell we can actually run 7z on the remote end and get the extracted key only. I don't see how to achieve that via SFTP only.

Right, I forgot about that feature. That really seems impossible via SFTP.

I do consider having a config that switches to SFTP only (and consequently lack that 7z feature).

That would be nice to have. I don't think inode limitations are that much of a concern outside of HPC environments, although I should do a rough calculation if we might run into issues on that front.

So, I'll leave this open, but it may move soon-ish into a dedicated repo for everything RIA related.

Great!