Closed chrisglass closed 9 years ago
The problem is that any extra id_map defined for a container through raw.lxc will cause the container to fail to start. This happens both through profiles and direct configuration.
I don't quite recall what was tried last week, however so long as I add the new requested range to /etc/subuid and /etc/subgid, this works fine for me.
can you please show a very specific example of this failing, showing /etc/sub[ug]id and the raw.lxc used?
So, my end goal is to allow bind mounts inside the container (to have functionality like the "-b $USERNAME" in lxc).
So I created a profile with the following:
name: homebind
config:
raw.lxc: |
lxc.mount.entry = /home/tribaal home/tribaal none bind 0 0
devices: {}
That mounts my home partition inside the container when I apply it, but it's read-only. I suspect that's because the uid outside the container and the uid inside the container are not the same (that was the working hypothesis last week).
@chrisglass I was saying the lxd.id_map in lxc.raw works for me, can you please try again and if/when it fails, show the /etc/subuid, /etc/subgid, and the file contents for 'lxc config edit
@hallyn I'll give it a try then.
Adding id_map makes the container fail to start as far as I can tell.
Here is the exact series of steps I did now (all binaries from master as of c6c3c9259b661a6432833146a3f8c2bfb91b11cb):
name: idmap
config:
raw.lxc: |
lxc.id_map = u 0 100000 1000
lxc.id_map = g 0 100000 1000
lxc.id_map = u 1000 1000 1
lxc.id_map = g 1000 1000 1
lxc.id_map = u 1001 101001 64535
lxc.id_map = g 1001 101001 64535
devices: {}
The containers fails to start with "error: exit status 1"
The debug output on the server side looks like:
Oh, yes, that won't work. I'm not quite sure how we should support that, but we definately should.
For now, I thought this bug was about juts adding a new mapping. So for instance
raw.lxc: | lxc.id_map = u 200000 1000 2 lxc.id_map = g 200000 1000 2
so that your host uids and gids 1000-1001 get mapped into the container, albeit at weird high uids. You could then define 200000 and 200001 in the container's /etc/passwd and /etc/group.
(Note that my suggestion remains to only map in host gids, and not your primary gid, in order to protect host from guest mistakes)
So "punching a hole" in the mappings (to map uid X on host to uid X in container) should either be a separate issue, blocked on doing per-user idmaps, or it should be a part of issue #632
Unfortunately, setting only the id_map you pasted still results in non-starting containers :/
Profile:
name: idmap
config:
raw.lxc: |
lxc.id_map = u 200000 1000 2
lxc.id_map = g 200000 1000 2
devices: {}
Starting a machine after the profile was applied results in: http://pastebin.ubuntu.com/11098684/
could you please show the result of 'lxc info foo --show-log' to show the container startup error messages?
also please show /etc/subuid and /etc/subgid
No, using a profile to add the idmaps works fine for me. I did have to map to 400000 in the container because i was allocating 200001 uids by default, so container id 200000 was already taken.
This is a kernel issue of some sort...
I've tried applying a profile which maps uid and gid 200000 of the host to the same id in the container. The resulting LXC config looks as one would expect but starting it fails with newuidmap and newgidmap reporting EINVAL coming from the write to /proc/PID/{u|g}id_map
I've also reproduced this behavior by cloning a task with CLONE_NEWUSER (lxc-unshare -s USER -- /bin/bash), then attempting to setup its userns directly as root.
I can perfectly write "200000 200000 1" to /proc/PID/uid_map and so can I write "0 165536 65536", but I can never write both of them conbined.
If I write "0 165536 65536\n200000 200000 1", the first map is used and the second isn't applied. Write returns EINVAL.
Oddly enough, we're not seeing this problem when mapping the user's own uid/gid in an unprivileged container, so I'm really wondering what's going on in the kernel.
For good measure, I've logged calls to newnuidmap on my system, here are some examples:
called with: 20622 0 100000 1 201105 201105 1 called with: 20633 0 100000 65536 201105 201105 1 called with: 20630 0 100000 65536 201105 201105 1
called with: 22836 65536 0 1 0 165536 65536 201105 201105 1 called with: 22829 0 165536 65536 201105 201105 1
called with: 23211 65536 0 1 0 165536 65536 called with: 23203 0 165536 65536 called with: 25346 0 165536 65536
Note that when failing, it's always the write syscall failing, not one of the sanity checks in newuidmap, so it's not a problem with the map in /etc/uidmap.
I can perfectly write "200000 200000 1" to /proc/PID/uid_map and so can I write "0 165536 65536", but I can never write both of them conbined.
That is expected, since 165536 < 200000 < 165536+65536
I don't understand what the "called with" numbers mean especially in the failed case. There are 10 numbers... ?
@chrisglass can you please show the information I requested above (/etc/subuid, /etc/subgid, and container startup failure log from "lxc info foo --show-log")?
Sure, sorry.
Here's the profile I have now:
name: idmap
config:
raw.lxc: |
lxc.id_map = u 200000 1000 2
lxc.id_map = g 200000 1000 2
devices: {}
And the requested info: lxc info foo --show-log: http://paste.ubuntu.com/11101581/ /etc/subuid: http://paste.ubuntu.com/11101593/ /etc/subgid: http://paste.ubuntu.com/11101601/
@chrisglass your /etc/subuid and /etc/subgid do not have an entry for root:1000:2
@hallyn Confirmed that the problem I ran into was an overlap issue. I'm so used to my usual 100000+65536 range that I didn't even think about my userid already being part of the mapping (which is a problem for me, so I'll change the lxd range on my machines to avoid that).
Once I resolved the overlap problem, a container with a profile including lxc.id_map in raw.lxc started succesfuly and with the right map. So looks like there's no lxd bug after all.
Un-miletoning, tagging and assigning as there doesn't appear to be anything wrong with LXD.
Not a bug in lxd, but clearly we'll need to make some of this easier! Just not sure where/how yet.
So, it is still impossible to modify files mounted from host system?
No. You can setup the bind-mount and if the bind-mounted directory is world writable or writable to a uid which exists in the container, you'll be able to write to it.
Alternatively, mapping the needed uids using lxc.id_map works too.
So, is it possible by default, for unprivileged container, map current user from host to root inside guest?
LXD runs as root, so for it "current user" would be uid 0, which would be a really really bad idea, so no.
LXD runs as root, so for it "current user" would be uid 0, which would be a really really bad idea, so no.
But I execute lxc
utility without sudo. It can forward my uid to LXD.
So the uid map would change based on what user used the lxc command tool? That seems rather confusing and also would be very weird when dealing with remote LXD hosts where your uid may well be owned by somebody else.
I am not sure how it is supposed to work then. My thought was that for unprivileged container user chooses a folder he wants to share from local machine. How that will work if the container is accessed from remote? Does LXC allow remote folder sharing, what happens if connection drops? I think that needs some sort of specification, because there are two use cases with remote shares:
My vision is that 2 should work by default, and advanced users should bother with remote shares type 1 configuration explicitly.
LXD can only setup bind-mounts. Those must be setup ahead of times, as in, we can't create new ones or change them after the container is started.
LXD itself runs as root and has no knowledge of what's the uid/gid of the calling user, nor could it even get that when receiving a connection from a remote machine.
As such, all we can do with bind-mounts is have them be absolute paths which must exists at the time the container is started and cannot change based on who's execing stuff into the container.
Altering a uid/gid map at runtime also isn't possible. To do so, we'd need to stop the container, possibly re-map all the uid/gid on the filesystem (so touching all files) and then starting it all over again with the new map. (The rewriting step is required in the event that your own user's uid is in the middle of the mapped range in the container).
Useful info. Is it going to be documented in some place that explains how to share directories with containers?
as discussed :)