containers / docker-lvm-plugin

Docker volume plugin for LVM volumes
GNU Lesser General Public License v3.0
155 stars 64 forks source link

Add support for thinly-provisioned volumes and snapshots #9

Closed cschauer closed 8 years ago

cschauer commented 8 years ago

Use --opt thinpool=pool to create a thin volume in vgName/pool Use --opt snapshot=origin to create a snapshot of vgName/origin

shishir-a412ed commented 8 years ago

@cschauer Let me play around with it, and I will move it to code review. Thanks for opening this ! I was thinking about this too.

shishir-a412ed commented 8 years ago

@cschauer I am trying to create a snapshot using your patch but getting an error.

docker volume create -d lvm --name foobar_snapshot --opt snapshot=foobar

Where foobar is my original logical volume.

In your code this docker CLI command would translate to lvcreate -n foobar_snapshot -s volume_group_one/foobar. If I run this manually, I get the error:

Error: Please specify either size or extents with snapshots.

I am able to create a snapshot if I pass the size. Below command works for me.

lvcreate -L 0.3G -s -n foobar_snapshot /dev/volume_group_one/foobar

Just wondering how is it working for you ? Output of dnf info lvm2

sh-4.3# dnf info lvm2
Last metadata expiration check: 0:53:04 ago on Thu Jul 14 09:42:11 2016.
Installed Packages
Name        : lvm2
Arch        : x86_64
Epoch       : 0
Version     : 2.02.132
Release     : 2.fc23
Size        : 2.1 M
Repo        : @System
From repo   : updates
Summary     : Userland logical volume management tools
URL         : http://sources.redhat.com/lvm2
License     : GPLv2
Description : LVM2 includes all of the support for handling read/write operations on
            : physical volumes (hard disks, RAID-Systems, magneto optical, etc.,
            : multiple devices (MD), see mdadd(8) or even loop devices, see
            : losetup(8)), creating volume groups (kind of virtual disks) from one
            : or more physical volumes and creating one or more logical volumes
            : (kind of logical partitions) in volume groups.

Shishir

cschauer commented 8 years ago

@shishir-a412ed Thanks for taking a looks at this. I forgot about snapshots of regular logical volumes. If you create a snapshot of a thinly-provisioned volume then you don't need to specify a size. I will update the PR to add support for regular snapshots as well. I also found a bug with the thin volumes, it looks like they are set to skip activation by default, so there was an error mounting. I'll get that fixed.

After the update, the usage will be:

Create a snapshot of a regular logical volume by specifying a size.

docker volume create -d lvm --name foobar_snapshot --opt snapshot=foobar --opt size=0.3G

If size is omitted then lvm will attempt to create a thin snapshot.

docker volume create -d lvm --name thin_snapshot --opt snapshot=thin

This matches the usage of lvcreate.

shishir-a412ed commented 8 years ago

@cschauer Nice catch on the --setactivationskip :)

Few things:

1) Can you squash both commits into one commit. 2) I notice your patch doesn't give a provision to create a thin pool (which can also be created using lvcreate and then later be used to provision thin volumes out of that pool). It makes sense to me as we don't want to list the thin pool in docker volume ls command output (or bind mount the pool inside the container).

However we would need to document all this. We should tell the user how to create the pool and use this to provision thin volumes. Following docs would be updated {README.md, man/docker-lvm-plugin.8.md}

1) Under Usage we can add a point (5) saying

To create thinly-provisioned volumes a user (administrator) has to create a thin pool using lvcreate command.

lvcreate -L 1G -T volume_group_one/mythinpool 

would create a thinpool {mythinpool} of size 1G under volume group {volume_group_one}

2) Then under "Volume Creation" you can give examples of how to create a thin volume using this pool we created above, and examples of how to create snapshots of regular and thin volumes.

Shishir

cschauer commented 8 years ago

@shishir-a412ed

Sounds good, I'll update the documentation, squash commits, and update the pull request.

Also, I found a bug with thin snapshots while testing some stuff last night. I'm using the plugin to create thin snapshot volumes for a mysql database, so I can run lots of mysql containers without taking up much extra space on disk. If I try to mount more than one volume with the same origin, I get a "duplicate filesystem UUID error". After some googling, I learned that XFS only allows one mount per UUID. For snapshots, we'd have to do mount -o nouuid. Might have to hold off on merging this one until after the changes I mentioned above.

shishir-a412ed commented 8 years ago

@cschauer Right, since these volumes are snapshot volumes (regular or thin) they will have the same filesystem UUID of the original volume XFS filesystem. We can either solve this by:

a) Generating a new UUID for the XFS filesystem of the snapshot using xfs_admin b) Using mount -o nouuid option.

I would also go with what you suggested. Using (b) as it doesn't involve changing the FS. Just keep in mind, that this problem is for both regular and thin volume snapshots.

Shishir

shishir-a412ed commented 8 years ago

Looping in other maintainers.

@rhatdan @runcom Feel free to chime in.

shishir-a412ed commented 8 years ago

@cschauer Any updates on this one? You might need a rebase as well.

Shishir

cschauer commented 8 years ago

@shishir-a412ed I'll get to this over the weekend. I'll rebase and squash commits before updating the PR.

shishir-a412ed commented 8 years ago

@cschauer Cool. Feel free to ping me on irc (/nick shishir-a412ed) if you need any help. I hangout on #docker-dev #go-nuts on freenode. (Monday to Friday 9AM to 5 PM EST).

Shishir

cschauer commented 8 years ago

@shishir-a412ed I think I got this finished up. I found another edge case while testing that involves two volumes where one is a snapshot of the other, say foobar and foobar_snapshot. If you delete foobar, then lvm will also delete foobar_snapshot. Now you're left with a docker volume foobar_snapshot that can never be removed since trying to remove will result in an error with lvremove since the volume doesn't exist anymore.

I fixed this by checking if a volume is a snapshot origin when removing and returning an error if it is. The user will need to delete all snapshot volumes before removing the origin. Note, this doesn't happen with thin snapshots.

shishir-a412ed commented 8 years ago

@cschauer There is a bug in lvdisplayGrep. This is not something you have introduced :) This is more related to Unix. I am still trying to find a fix.

The way grep works in unix is if grep finds a matching pattern the exit code is 0. and if there is no matching pattern, the exit code is 1. A simple example would be:

a) echo "hello world"|grep "hello"
b) echo $? (Outputs 0)
c) echo "hello world"|grep "noHello"
d) echo $? (Outputs 1)

The way I implemented isThinlyProvisioned() is: if the grep is successful i.e there is a matching pattern, the output of grep would be written to the bytes buffer (b2). N since the buffer length != 0, this will return TRUE.

However when there is no matching pattern, something like point (d) above. https://github.com/projectatomic/docker-lvm-plugin/blob/master/driver.go#L320 will fail and return an error.

This is affecting all the cases where grep doesn't have a matching pattern. Still looking into how to fix this. Thought I should just point it out.

Shishir

shishir-a412ed commented 8 years ago

@cschauer I have fixed the above mentioned issue. Your PR might require a rebase now.

Shishir

cschauer commented 8 years ago

@shishir-a412ed Thanks!

shishir-a412ed commented 8 years ago

@cschauer what is your host Linux OS ?

shishir-a412ed commented 8 years ago

@cschauer Thanks ! I got your msg on irc. The reason I was asking about your host Linux OS was because I have a feeling that LVDisplay have somewhat different output on ubuntu and Red Hat distributions.

Below is the output of lvdisplay command of 4 volumes on my system (Fedora-23). http://pastebin.com/Smreiwum I have created:

1) A regular volume
2) A thin volume
3) A regular volume snapshot
4) A thin volume snapshot.

In the output if you check:

a) lvdisplay of a regular snapshot: foobar_rv_snapshot doesn't have a "destination of" keyword. It has a "destination for". This would result in isSnapshot() returning a FALSE value instead of TRUE. b) Similarly lvdisplay of a thin volume: foobar_tv doesn't have "source of" keyword. It is only there in regular volume output. This affects isSnapshotOrigin() function.

I feel it would be hard to make a generic solution for this. The best we can do is to support atleast on ubuntu and Red Hat (RHEL, Fedora and Centos) distributions. I think Red Hat distributions should have more or less the same output. So we just need to find the right keywords that supports our grep functions on both distributions.

Shishir

shishir-a412ed commented 8 years ago

@cschauer Did you get a chance to look at this ?

Shishir

cschauer commented 8 years ago

@shishir-a412ed We could use 'destination\s' and 'source\s' instead. Since volume names can't have whitespace, adding the whitespace check after the keyword avoids false positives with volumes that have destination or source in their name. If this sounds ok to you I can make the change and update the PR today.

However, I'd like to get your thoughts on reading in the volume group lvm metadata backup from /etc/lvm/backup/vgname. I think this would be a safer option since it doesn't rely on specific implementations of grep or lvdisplay. I also think using the vg metadata would make it easier to add features going forward. For example, creating a docker volume from an existing logical volume or putting information about the volume in the response to volume inspect.

One potential problem is that the metadata for lvm 2 is different from that of lvm 1. Is lvm 1 something that the plugin needs to support? If so, we'd need to detect the lvm version on the system to determine the proper format. We would also need to make it clear in the documentation that metadata backups should not be disabled in lvm.conf.

Edit: The backup thing could be a separate PR, it doesn't necessarily have to block this one. I'm not sure when I would have the time to do that and I know I've been a little slow updating this one at times. Thanks for the reminders and encouragement.

shishir-a412ed commented 8 years ago

@cschauer

1) I mulled over it yesterday, unfortunately I was not able to find a clean solution for this :) The problem is LVM2 commands output (e.g. lvdisplay) changes with every run. Sometimes they have source or destination keywords, sometimes they don't. If you see one of the output of lvdisplay in the above link http://pastebin.com/Smreiwum:

In the output, the left column define the FIELDS and the right column define the VALUES. The FIELD names are always consistent with each run, so we can rely on them for grep purposes if we want. However we cannot rely on the VALUES. Which means keywords {"LV Pool", "LV Thin Origin"} should be okay, however keywords {"source of", "destination of"} may or may not be there in the output of each run.

2) Are you talking about parsing this file /etc/lvm/backup/vgname to extract information about whether a particular volume is a thin volume OR snapshot OR snapshot origin ?

See sample file attached on my system. http://pastebin.com/r3ZAZyMM

What are we talking about here: a) Write a parser to parse this file into golang structs or some kind of data structure. b) Read information from those data structures to make a decision (what we are making using lvdisplay and grep right now).

IIUC, yes you can write a parser to do that, not sure how easy it would be as the file looks pretty complex. We can open a separate PR for that.

3) This plugin only supports LVM2.

Let me discuss this with my tech lead @rhatdan and see if we can merge this PR and take over the work. That way once I am done, you can just rebase.

Shishir

shishir-a412ed commented 8 years ago

@cschauer I slept on it. Now I think I was overcomplicating it :) I already have this information (whether a volume is a snapshot or snapshotOrigin) at Create time.

When the user takes the volume snapshot:

docker volume create -d lvm --name foobar_snapshot --opt snapshot=foobar --opt size=0.25G

I can store this information in my in memory store. N Then look it up at Mount and Remove time . That ways I won't need isSnapshot() and isSnapshotOrigin() functions.

Let me discuss this with my tech lead on Monday, and see if we can take over this PR and finish it off quickly.

Shishir

shishir-a412ed commented 8 years ago

@cschauer I discussed this with my tech lead @rhatdan , and we are going to merge this :) I will open another PR after this, which will have the rest of the feature.

Thanks a lot for your contribution (and follow ups) ! This is a really good feature to have.

Shishir