awslabs / damo

DAMON user-space tool
https://damonitor.github.io/
GNU General Public License v2.0
148 stars 28 forks source link

No such file or directory: '/sys/devices/system/memory/block_size_bytes' #83

Closed honggyukim closed 8 months ago

honggyukim commented 8 months ago

Hi SeongJae,

I have a numa system on qemu as follows.

$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 1972 MB
node 0 free: 1779 MB
node 1 cpus:
node 1 size: 5983 MB
node 1 free: 5969 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

In this machine, I see an error when using --numa_node option as follows.

$ ./damo fmt_json --numa_node 0
Traceback (most recent call last):
  File "/home/root/damo/./damo", line 116, in <module>
    main()
  File "/home/root/damo/./damo", line 113, in main
    subcmd.execute(args)
  File "/home/root/damo/_damo_subcmds.py", line 34, in execute
    self.module.main(args)
  File "/home/root/damo/damo_fmt_json.py", line 27, in main
    kdamonds, err = _damon_args.kdamonds_for(args)
  File "/home/root/damo/_damon_args.py", line 335, in kdamonds_for
    ctx, err = damon_ctx_for(args)
  File "/home/root/damo/_damon_args.py", line 230, in damon_ctx_for
    init_regions, err = init_regions_for(args)
  File "/home/root/damo/_damon_args.py", line 36, in init_regions_for
    init_regions = _damo_paddr_layout.paddr_region_of(args.numa_node)
  File "/home/root/damo/_damo_paddr_layout.py", line 163, in paddr_region_of
    paddr_ranges_ = paddr_ranges()
  File "/home/root/damo/_damo_paddr_layout.py", line 130, in paddr_ranges
    return integrate(memblock_ranges(), iomem_ranges())
  File "/home/root/damo/_damo_paddr_layout.py", line 70, in memblock_ranges
    sz_block = int(readfile('/sys/devices/system/memory/block_size_bytes'), 16)
  File "/home/root/damo/_damo_paddr_layout.py", line 43, in readfile
    with open(file_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/sys/devices/system/memory/block_size_bytes'

There is no memory directory inside /sys/devices/system/ in the system.

$ ls /sys/devices/system/
clockevents  clocksource  container  cpu  machinecheck  node

So this edge case should also be handled properly.

The tested damo version is as follows.

$ git log --oneline --decorate=no -1
01474c2 Update the version

The kernel version is as follows.

$ uname -r
6.6.0-rc4
honggyukim commented 8 months ago

The qemu was started with the following command.

$ sudo qemu-system-x86_64 \
        -enable-kvm -nographic \
        -kernel ./arch/x86_64/boot/bzImage \
        -hda core-image-sato-sdk-qemux86-64.ext4 \
        -append "nokaslr root=/dev/sda console=ttyS0" \
        -device e1000,netdev=net0 \
        -netdev user,id=net0,hostfwd=tcp::5002-:22 \
        -cpu host -smp cpus=4 \
        -m 8G \
        -object memory-backend-ram,size=2G,id=ram0 \
        -object memory-backend-ram,size=6G,id=ram1 \
        -numa node,nodeid=0,memdev=ram0,cpus=0-3 \
        -numa node,nodeid=1,memdev=ram1
sj-aws commented 8 months ago

Thank you for reporting. Will take a look soon.

honggyukim commented 8 months ago

There is no memory directory inside /sys/devices/system/ in the system.

The /sys/devices/system/memory is created at https://github.com/torvalds/linux/blob/v6.6-rc4/drivers/base/memory.c#L940 as follows.

/*
 * Initialize the sysfs support for memory devices. At the time this function
 * is called, we cannot have concurrent creation/deletion of memory block
 * devices, the device_hotplug_lock is not needed.
 */
void __init memory_dev_init(void)
{
        ...
    ret = subsys_system_register(&memory_subsys, memory_root_attr_groups);
        ...
}

However, it's only available when CONFIG_MEMORY_HOTPLUG build config is enabled. I'm not sure if we really have to support this case.

I've just confirmed that enabling CONFIG_MEMORY_HOTPLUG in my qemu shows the /sys/devices/system/memory/block_size_bytes file.

sj-aws commented 8 months ago

Thank you for the nice finding. I'll make the code to handle the case with a proper error message.

sj-aws commented 8 months ago

Uploaded the fix. Please comment here or open another issue if it doesn't fix the case.