btrfs / btrfs-todo

An issues only repo to organize our TODO items
21 stars 2 forks source link

btrfs search ioctl library/drgn integration/thing to make debugging easier #20

Open josefbacik opened 3 years ago

josefbacik commented 3 years ago

This is very open ended, but generally what I want is an easy way to live look up any metadata on the file system, preferably using the SEARCH_V2 ioctl. drgn is my inspiration for how this would work, so my thought was something akin to that so I can just

# python
>>> import btrfs
>>> fs = btrfs.btrfs_open("/some/path")
>>> key = btrfs.Key(tree=btrfs.BTRFS_EXTENT_TREE_OBJECTID, min_type=btrfs.BTRFS_BLOCK_GROUP_ITEM)
>>> values = btrfs.search(key)
>>> for i in values:
>>>    print(i.objectid, i.block_group.offset);

The key things are

danobi commented 3 years ago

First thought is that we could reuse a lot of drgn infra (to get kernel types, C types -> python types, C binding plumbing, etc).

If drgn integration is an option it'd make a lot of sense to go with that.

josefbacik commented 3 years ago

Yeah I'm fine with it being integrated into drgn, it would make the arbitrary data structure problem less annoying, and I'd rather not re-invent that wheel just for this. I suppose it would be cool to abstract that portion of drgn out so we could both use it independently, but I'm not married to that idea.

danobi commented 3 years ago

Another option is to use BTF for type information. I've already written BTF parsing code in libbpf-rs ( https://github.com/libbpf/libbpf-rs/blob/master/libbpf-cargo/src/btf/btf.rs ) that could be reused. Although if I went that route I'd be inclined to write a custom REPL with a small DSL. Something like a mini-gdb where there's some variable, expression, and printing support. I don't believe that'd be too hard. Then I wouldn't have to figure out and deal with all the python API nuances.

josefbacik commented 3 years ago

That's totally reasonable alternative. I suggested the python route to save you the trouble of needing to implement fancy things like for loops and such. If you want to go as far as implementing your own DSL that would be completely fine.

If you are going to go that route then there's something else I would love, and that would be the ability to modify any metadata on a unmounted file system. This would be helpful when crafting test cases for fsck and such. If I had a tool where I could say

# cat blah.btrfscript
item = search(some objectid, some type, some offset)
item.disk_bytenr.write(garbage)
# btrfs-metadata-tool < blah.btrfscript

that could be a big win for easy of building test cases for fsck.

But that's a pretty ambitious distraction ;). For now I think live file system debugging is the more pressing need, but keeping this usecase in mind if it makes sense would be cool as well.

danobi commented 3 years ago

Prototype is up here: https://github.com/danobi/btrd . Happy to move it into btrfs github org one day if desired.

danobi commented 3 years ago

Checked in a script that prints out block group usage stats: https://github.com/danobi/btrd/blob/master/scripts/block_group_dist.btrd