canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.34k stars 930 forks source link

lxc.id_map is not working in profiles #579

Closed chrisglass closed 9 years ago

chrisglass commented 9 years ago

as discussed :)

stgraber commented 9 years ago

The problem is that any extra id_map defined for a container through raw.lxc will cause the container to fail to start. This happens both through profiles and direct configuration.

hallyn commented 9 years ago

I don't quite recall what was tried last week, however so long as I add the new requested range to /etc/subuid and /etc/subgid, this works fine for me.

can you please show a very specific example of this failing, showing /etc/sub[ug]id and the raw.lxc used?

chrisglass commented 9 years ago

So, my end goal is to allow bind mounts inside the container (to have functionality like the "-b $USERNAME" in lxc).

So I created a profile with the following:

name: homebind
config:
    raw.lxc: |
        lxc.mount.entry = /home/tribaal home/tribaal none bind 0 0
devices: {}

That mounts my home partition inside the container when I apply it, but it's read-only. I suspect that's because the uid outside the container and the uid inside the container are not the same (that was the working hypothesis last week).

hallyn commented 9 years ago

@chrisglass I was saying the lxd.id_map in lxc.raw works for me, can you please try again and if/when it fails, show the /etc/subuid, /etc/subgid, and the file contents for 'lxc config edit ' ?

chrisglass commented 9 years ago

@hallyn I'll give it a try then.

chrisglass commented 9 years ago

Adding id_map makes the container fail to start as far as I can tell.

Here is the exact series of steps I did now (all binaries from master as of c6c3c9259b661a6432833146a3f8c2bfb91b11cb):

name: idmap                                                                    
config:                                                                        
  raw.lxc: |                                                                   
    lxc.id_map = u 0 100000 1000                                               
    lxc.id_map = g 0 100000 1000                                               
    lxc.id_map = u 1000 1000 1                                                 
    lxc.id_map = g 1000 1000 1                                                 
    lxc.id_map = u 1001 101001 64535                                           
    lxc.id_map = g 1001 101001 64535                                           
devices: {}

The containers fails to start with "error: exit status 1"

The debug output on the server side looks like:

http://pastebin.ubuntu.com/11098309/

hallyn commented 9 years ago

Oh, yes, that won't work. I'm not quite sure how we should support that, but we definately should.

For now, I thought this bug was about juts adding a new mapping. So for instance

raw.lxc: | lxc.id_map = u 200000 1000 2 lxc.id_map = g 200000 1000 2

so that your host uids and gids 1000-1001 get mapped into the container, albeit at weird high uids. You could then define 200000 and 200001 in the container's /etc/passwd and /etc/group.

(Note that my suggestion remains to only map in host gids, and not your primary gid, in order to protect host from guest mistakes)

hallyn commented 9 years ago

So "punching a hole" in the mappings (to map uid X on host to uid X in container) should either be a separate issue, blocked on doing per-user idmaps, or it should be a part of issue #632

chrisglass commented 9 years ago

Unfortunately, setting only the id_map you pasted still results in non-starting containers :/

Profile:

name: idmap                                                                      
config:                                                                          
  raw.lxc: |                                                                     
    lxc.id_map = u 200000 1000 2                                                 
    lxc.id_map = g 200000 1000 2                                                 
devices: {}

Starting a machine after the profile was applied results in: http://pastebin.ubuntu.com/11098684/

hallyn commented 9 years ago

could you please show the result of 'lxc info foo --show-log' to show the container startup error messages?

hallyn commented 9 years ago

also please show /etc/subuid and /etc/subgid

hallyn commented 9 years ago

No, using a profile to add the idmaps works fine for me. I did have to map to 400000 in the container because i was allocating 200001 uids by default, so container id 200000 was already taken.

stgraber commented 9 years ago

This is a kernel issue of some sort...

I've tried applying a profile which maps uid and gid 200000 of the host to the same id in the container. The resulting LXC config looks as one would expect but starting it fails with newuidmap and newgidmap reporting EINVAL coming from the write to /proc/PID/{u|g}id_map

I've also reproduced this behavior by cloning a task with CLONE_NEWUSER (lxc-unshare -s USER -- /bin/bash), then attempting to setup its userns directly as root.

I can perfectly write "200000 200000 1" to /proc/PID/uid_map and so can I write "0 165536 65536", but I can never write both of them conbined.

If I write "0 165536 65536\n200000 200000 1", the first map is used and the second isn't applied. Write returns EINVAL.

Oddly enough, we're not seeing this problem when mapping the user's own uid/gid in an unprivileged container, so I'm really wondering what's going on in the kernel.

For good measure, I've logged calls to newnuidmap on my system, here are some examples:

Unprivileged container started from my user (uid 201105) => works fine

called with: 20622 0 100000 1 201105 201105 1 called with: 20633 0 100000 65536 201105 201105 1 called with: 20630 0 100000 65536 201105 201105 1

LXD generated LXC config, started as root => fails

called with: 22836 65536 0 1 0 165536 65536 201105 201105 1 called with: 22829 0 165536 65536 201105 201105 1

LXD generated LXC config, started as root => works fine

called with: 23211 65536 0 1 0 165536 65536 called with: 23203 0 165536 65536 called with: 25346 0 165536 65536

Note that when failing, it's always the write syscall failing, not one of the sanity checks in newuidmap, so it's not a problem with the map in /etc/uidmap.

hallyn commented 9 years ago

I can perfectly write "200000 200000 1" to /proc/PID/uid_map and so can I write "0 165536 65536", but I can never write both of them conbined.

That is expected, since 165536 < 200000 < 165536+65536

hallyn commented 9 years ago

I don't understand what the "called with" numbers mean especially in the failed case. There are 10 numbers... ?

hallyn commented 9 years ago

@chrisglass can you please show the information I requested above (/etc/subuid, /etc/subgid, and container startup failure log from "lxc info foo --show-log")?

chrisglass commented 9 years ago

Sure, sorry.

Here's the profile I have now:

name: idmap                                                                    
config:                                                                        
  raw.lxc: |                                                                   
    lxc.id_map = u 200000 1000 2                                               
    lxc.id_map = g 200000 1000 2                                               
devices: {}

And the requested info: lxc info foo --show-log: http://paste.ubuntu.com/11101581/ /etc/subuid: http://paste.ubuntu.com/11101593/ /etc/subgid: http://paste.ubuntu.com/11101601/

hallyn commented 9 years ago

@chrisglass your /etc/subuid and /etc/subgid do not have an entry for root:1000:2

stgraber commented 9 years ago

@hallyn Confirmed that the problem I ran into was an overlap issue. I'm so used to my usual 100000+65536 range that I didn't even think about my userid already being part of the mapping (which is a problem for me, so I'll change the lxd range on my machines to avoid that).

stgraber commented 9 years ago

Once I resolved the overlap problem, a container with a profile including lxc.id_map in raw.lxc started succesfuly and with the right map. So looks like there's no lxd bug after all.

stgraber commented 9 years ago

Un-miletoning, tagging and assigning as there doesn't appear to be anything wrong with LXD.

hallyn commented 9 years ago

Not a bug in lxd, but clearly we'll need to make some of this easier! Just not sure where/how yet.

techtonik commented 9 years ago

So, it is still impossible to modify files mounted from host system?

stgraber commented 9 years ago

No. You can setup the bind-mount and if the bind-mounted directory is world writable or writable to a uid which exists in the container, you'll be able to write to it.

Alternatively, mapping the needed uids using lxc.id_map works too.

techtonik commented 9 years ago

So, is it possible by default, for unprivileged container, map current user from host to root inside guest?

stgraber commented 9 years ago

LXD runs as root, so for it "current user" would be uid 0, which would be a really really bad idea, so no.

techtonik commented 9 years ago

LXD runs as root, so for it "current user" would be uid 0, which would be a really really bad idea, so no.

But I execute lxc utility without sudo. It can forward my uid to LXD.

stgraber commented 9 years ago

So the uid map would change based on what user used the lxc command tool? That seems rather confusing and also would be very weird when dealing with remote LXD hosts where your uid may well be owned by somebody else.

techtonik commented 9 years ago

I am not sure how it is supposed to work then. My thought was that for unprivileged container user chooses a folder he wants to share from local machine. How that will work if the container is accessed from remote? Does LXC allow remote folder sharing, what happens if connection drops? I think that needs some sort of specification, because there are two use cases with remote shares:

  1. Free for all storage attached to container for container
  2. User specific storage that each user should provide itself

My vision is that 2 should work by default, and advanced users should bother with remote shares type 1 configuration explicitly.

stgraber commented 9 years ago

LXD can only setup bind-mounts. Those must be setup ahead of times, as in, we can't create new ones or change them after the container is started.

LXD itself runs as root and has no knowledge of what's the uid/gid of the calling user, nor could it even get that when receiving a connection from a remote machine.

As such, all we can do with bind-mounts is have them be absolute paths which must exists at the time the container is started and cannot change based on who's execing stuff into the container.

Altering a uid/gid map at runtime also isn't possible. To do so, we'd need to stop the container, possibly re-map all the uid/gid on the filesystem (so touching all files) and then starting it all over again with the new map. (The rewriting step is required in the event that your own user's uid is in the middle of the mapped range in the container).

techtonik commented 9 years ago

Useful info. Is it going to be documented in some place that explains how to share directories with containers?

http://interactive.blockdiag.com/?compression=deflate&src=eJxLyslPzk7JTExXqOZSUAjOSCxKVdC1U_CJcAFRSk6ZeSkKvvmleSVK1kB5qLCfWzASLzjYA8SvBQAwMRQL