containerd / zfs

ZFS snapshotter plugin for containerd
Apache License 2.0
67 stars 29 forks source link

implement Usage #17

Closed AkihiroSuda closed 3 years ago

AkihiroSuda commented 6 years ago

Unlike btrfs, just returning USED field should be ok for zfs, I guess

https://linux.die.net/man/8/zfs

used
   The amount of space consumed by this dataset and all its descendents

related: https://github.com/containerd/containerd/issues/1801 (btrfs)

AkihiroSuda commented 6 years ago

@stevvooe @dmcgowan

Do we need to implement this before v1.1 GA?

If so, I'll port over btrfs implementation for consistency, rather than adopting the zfs-native USED value mentioned above.

https://github.com/containerd/containerd/pull/1871

dmcgowan commented 6 years ago

I didn't realize we didn't have usage implemented yet. I probably would have suggested we either implement it or not have it enabled by default. I think we are too late for 1.1.0 but we can fix for 1.1.1.

AkihiroSuda commented 4 years ago

Still not sure whether we should use naive differ as we did for btrfs (https://github.com/containerd/containerd/pull/1871), or use filesystem-native usage value, which we didn't adopt for btrfs (https://github.com/containerd/containerd/pull/1836).

For consistency with btrfs/overlayfs, probably we should use the former one, but for performance and future potential support of zfs quota, we should use ZFS-native usage value.

Maybe we should support both and make them configurable?

dalbani commented 4 years ago

Hey, I found this bug when trying to understand why I had hundreds of message in the logs of containerd in a MicroK8s environment with a ZFS snapshotter. For example: microk8s.daemon-containerd[19500]: time="2020-01-17T21:56:56.725325357Z" level=error msg="Failed to get usage for snapshot "sha256:xxxxxx" error="zfs does not implement Usage() yet" I suppose it doesn't affect the functionality / reliability of the setup, but it would be nice not to have all those errors in the logs ;-)

Note to self: I see an entry stats_collect_period = 10 in containerd-template.toml, so I suppose that's the reason why the error messages are logged every 10 seconds.

fuweid commented 4 years ago

@dalbani you are right. cri-containerd will collect usage about container snapshotters very stats_collect_period seconds. For the overlay/native/... snapshotter, Usage call will be act like du which means that it is time-consuming job. cri-containerd uses goroutine to collect data background. When kubelet wants to know the container status, cri-containerd will return the cache updated by that goroutine.

If not to have all those errors, zfs needs Usage implementation.

AkihiroSuda commented 4 years ago

@fuweid @dmcgowan WDYT: https://github.com/containerd/zfs/issues/17#issuecomment-571442576

fuweid commented 4 years ago

@AkihiroSuda @dmcgowan I prefer to use ZFS-native model as default instead of native-diff. In production, some apps like nodejs/python has a lot of small files and native-diff will consumes cpu resources in containerd side. It is cool to have ZFS-native model and native-diff both, and the switch is up to user's configuration.

apnar commented 4 years ago

Just ran into this issue as well. I expect it to be more frequent with Ubuntu 20.04 offering root on ZFS support now. The default microk8s install still uses native snapshotter but there is a how-to here https://microk8s.io/docs/install-alternatives on switching to ZFS snapshotter which triggers tons of error messages about Usage not being implemented unless you tweak the logging way down.

I imagine this would be a quick add for someone familiar with the code now that the decision to use ZFS native output was made above, but if no one has the time I may try diving into it.

ukd1 commented 3 years ago

Are there any plans to fix this? I'm still getting these errors...

sigxcpu76 commented 3 years ago

Until there is a final and good solution for this, can we get a quick patch instead? This generates tons of useless logging.

fuweid commented 3 years ago

close by #38