awslabs / damo

DAMON user-space tool
https://damonitor.github.io/
GNU General Public License v2.0
155 stars 26 forks source link

struct.error: unpack requires a buffer of 8 bytes #38

Closed honggyukim closed 1 year ago

honggyukim commented 1 year ago

Hi @sjp38,

Thanks for building this useful tool. I would like to test running damo but found some errors as follows.

$ sudo ./damo record 3257
Press Ctrl+C to stop
^C
signal 2 received
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 1.752 MB damon.data ]
Traceback (most recent call last):
  File "/home/honggyu/work/damo/./damo", line 104, in <module>
    main()
  File "/home/honggyu/work/damo/./damo", line 101, in main
    subcmd.execute(args)
  File "/home/honggyu/work/damo/_damo_subcmds.py", line 31, in execute
    self.module.main(args)
  File "/home/honggyu/work/damo/damo_record.py", line 145, in main
    _damon.wait_current_kdamonds_turned_off()
  File "/home/honggyu/work/damo/_damon.py", line 918, in wait_current_kdamonds_turned_off
    time.sleep(1)
  File "/home/honggyu/work/damo/damo_record.py", line 57, in sighandler
    cleanup_exit(signum)
  File "/home/honggyu/work/damo/damo_record.py", line 46, in cleanup_exit
    err = _damon_result.update_result_file(data_for_cleanup.rfile_path,
  File "/home/honggyu/work/damo/_damon_result.py", line 293, in update_result_file
    result, err = parse_damon_result(file_path)
  File "/home/honggyu/work/damo/_damon_result.py", line 216, in parse_damon_result
    result, err = record_to_damon_result(result_file)
  File "/home/honggyu/work/damo/_damon_result.py", line 133, in record_to_damon_result
    end_addr = struct.unpack('L', f.read(8))[0]
struct.error: unpack requires a buffer of 8 bytes

This error shows in different places when I run it again. This error happens in start_addr or nr_accesses as well. Could you please let me know if I did something wrong?

I have manually enabled DAMON config in 6.3.0-rc1 as follows.

$ cat /boot/config-$(uname -r) | grep DAMON
CONFIG_DAMON=y
CONFIG_DAMON_VADDR=y
CONFIG_DAMON_PADDR=y
CONFIG_DAMON_SYSFS=y
CONFIG_DAMON_DBGFS=y
CONFIG_DAMON_RECLAIM=y
CONFIG_DAMON_LRU_SORT=y
sj-aws commented 1 year ago

Hi Honggyu, thank you for report! Could you let me know the version of DAMO and more detailed reproduction steps of this issue?

honggyukim commented 1 year ago

I think the damo version is 40f09b5 and I put the pid of chrome in my system. I’m not sure if more reproducible steps are needed.

I will try with another simpler program later again. Thanks.

honggyukim commented 1 year ago

It looks like the workload is not related to this problem. The same error shows when I test it with masim.

I'm not sure if this is the right way to test but directly running perf record and run the output with damo also shows the same error as follows.

$ sudo perf record -a -e damon:damon_aggregated -o damon.data
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.492 MB damon.data ]

$ sudo ./damo report raw -i damon.data
Traceback (most recent call last):
  File "/home/honggyu/work/damo/./damo", line 104, in <module>
    main()
  File "/home/honggyu/work/damo/./damo", line 101, in main
    subcmd.execute(args)
  File "/home/honggyu/work/damo/_damo_subcmds.py", line 31, in execute
    self.module.main(args)
  File "/home/honggyu/work/damo/damo_report.py", line 39, in main
    subcmd.execute(args)
  File "/home/honggyu/work/damo/_damo_subcmds.py", line 31, in execute
    self.module.main(args)
  File "/home/honggyu/work/damo/damo_report_raw.py", line 31, in main
    result, err = _damon_result.parse_damon_result(file_path)
  File "/home/honggyu/work/damo/_damon_result.py", line 216, in parse_damon_result
    result, err = record_to_damon_result(result_file)
  File "/home/honggyu/work/damo/_damon_result.py", line 132, in record_to_damon_result
    start_addr = struct.unpack('L', f.read(8))[0]
struct.error: unpack requires a buffer of 8 bytes

I have tested it on another qemu environment and it worked fine, but not sure why it doesn't work in my main Linux machine. I will come back when things get clear.

sj-aws commented 1 year ago

Hi Honggyu, thank you very much for such detailed information.

I cannot reproduce the issue on my machine, unfortunately. There were some more changes on next branch after the commit you mentioned (40f09b53568d). Could you please check if the latest version of DAMO also shows same issue?

Also, did this issue happen on earlier versions? If not, could you please let me know the latest good version, and do bisection if possible, to find the first bad commit?

And, seems the above stacktrace is not for 40f09b53568d, as the version's line 216 of _damon_result.py is write_damon_record(), not parse_damon_result(). Could you please let me know the commit of the stacktrace captured version?

honggyukim commented 1 year ago

Thanks very much for your help.

I cannot reproduce the issue on my machine, unfortunately.

It seems like the problem comes from my environment. I had to make another change as follows.

diff --git a/_damon_result.py b/_damon_result.py
index a52a1a9..48cbbb7 100644
--- a/_damon_result.py
+++ b/_damon_result.py
@@ -340,7 +340,7 @@ def start_monitoring_record(file_path, file_format, file_permission):
     except:
         return None, 'perf is not installed'
     pipe = subprocess.Popen(
-            ['perf', 'record', '-a', '-e', 'damon:damon_aggregated', '-o',
+            ['/home/honggyu/usr/bin/perf', 'record', '-a', '-e', 'damon:damon_aggregated', '-o',
                 file_path])
     record_requests[pipe] = [file_path, file_format, file_permission]
     return pipe, None0

Without this change in my environment, it shows the following error. I think it's just perf issue not from damo.

$ sudo ./damo record 2398
Press Ctrl+C to stop
WARNING: perf not found for kernel 6.3.0

  You may need to install the following packages for this specific kernel:
    linux-tools-6.3.0-rc1
    linux-cloud-tools-6.3.0-rc1

  You may also want to install one of the following packages to keep up to date:
    linux-tools-rc1
    linux-cloud-tools-rc1

I just changed damo to use perf ELF binary instead of shell script. The perf I used is this, which is same as the kernel version.

$ perf --version
perf version 6.3.rc1.gfe15c26ee26e

And, seems the above stacktrace is not for https://github.com/awslabs/damo/commit/40f09b53568df33a343d62940100b73a0fb0e1dc, as the version's line 216 of _damon_result.py is write_damon_record(), not parse_damon_result().

Sorry, it was be891a8a6e1040b598e125f107f78d795e1a8ce0, not 40f09b5. I can't test it on company so just chose the remembered one.

Could you please let me know the commit of the stacktrace captured version?

I just tested it after rebase again onto 9cddbe4f7378ceec46afe4b0535df4d9078b6b5d but it shows the same error.

$ sudo ./damo record 2398
Press Ctrl+C to stop
^C
signal 2 received
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.401 MB damon.data (224 samples) ]
Traceback (most recent call last):
  File "/home/honggyu/work/damo/./damo", line 96, in <module>
    main()
  File "/home/honggyu/work/damo/./damo", line 93, in main
    subcmd.execute(args)
  File "/home/honggyu/work/damo/_damo_subcmds.py", line 31, in execute
    self.module.main(args)
  File "/home/honggyu/work/damo/damo_record.py", line 143, in main
    _damon.wait_current_kdamonds_turned_off()
  File "/home/honggyu/work/damo/_damon.py", line 929, in wait_current_kdamonds_turned_off
    time.sleep(1)
  File "/home/honggyu/work/damo/damo_record.py", line 54, in sighandler
    cleanup_exit(signum)
  File "/home/honggyu/work/damo/damo_record.py", line 39, in cleanup_exit
    _damon_result.stop_monitoring_record(data_for_cleanup.perf_pipe)
  File "/home/honggyu/work/damo/_damon_result.py", line 357, in stop_monitoring_record
    err = update_result_file(file_path, file_format)
  File "/home/honggyu/work/damo/_damon_result.py", line 289, in update_result_file
    result, err = parse_damon_result(file_path)
  File "/home/honggyu/work/damo/_damon_result.py", line 215, in parse_damon_result
    result, err = record_to_damon_result(result_file)
  File "/home/honggyu/work/damo/_damon_result.py", line 130, in record_to_damon_result
    snapshot = read_snapshot_from_record_file(f,
  File "/home/honggyu/work/damo/_damon_result.py", line 78, in read_snapshot_from_record_file
    start_addr = struct.unpack('L', f.read(8))[0]
struct.error: unpack requires a buffer of 8 bytes

This might not be a damo issue and I might ask you some stupid question. As mentioned, damo works fine in my another qemu environment, but I will setup another real machine within a few days and test again. Thanks again.

sj-aws commented 1 year ago

Hi Honggyu, thank you for sharing more findings and the updated error messages!

For WARNING: perf not found for kernel 6.3.0, you may be able to add some symlinks or tweak your sudo's PATH environment variable.

When recording is requested, in this case, DAMO turns DAMON on and records DAMON tracepoint events using perf. When you pressed Ctrl+C, DAMO sends SIGINT to the perf process, make it finished and create the record file. After that, DAMO checks if user wanted the perf output file format for the record. By default, it is not perf output file format but DAMON's optimized binary file format, namely DAMON record file. To transform the perf output file to DAMON record file, DAMO parses the perf output file and write the record information in DAMON record file format. This issue comes from the parsing failure.

I suspect this might be due to a race between perf termination and the perf output file writing. That is, perf output file is not written completely before DAMO parses the file, so it fails. Could you check if adding --output_type perfdata option to the damo record command works around this problem? Please also check following perf report raw works. The option will skip the transformation. You could also check it as you did perf record on your own on above reply (https://github.com/awslabs/damo/issues/38#issuecomment-1476187427).

honggyukim commented 1 year ago

Hi SeongJae, thank you very much for your explanation.

For WARNING: perf not found for kernel 6.3.0, you may be able to add some symlinks or tweak your sudo's PATH environment variable.

I found the root cause, the ubuntu system provides /usr/bin/perf in shell script to check whether the perf binary in PATH matches with the kernel version in the system. However, it's unhelpful when kernel is manually compiled and rebooted with the new image.

I just replaced /usr/bin/perf with newly compiled perf image in the same kernel source, then damo record works fine now.

$ sudo ./damo record $$
Press Ctrl+C to stop
^C
signal 2 received
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.567 MB damon.data (231 samples) ]

The result also looks good now.

$ sudo ./damo report raw                                                                                                                                                                                                                                                                                             
base_time_absolute: 34 m 22.851 s                         

monitoring_start:                0 ns                                        
monitoring_end:            100.898 ms                     
monitoring_duration:       100.898 ms                     
target_id: 0                                              
nr_regions: 13                                            
# start_addr     end_addr        length  nr_accesses   age                   
564a1796c000-564a17bff000 (   2.574 MiB)           0    -1
564a17bff000-564a17e30000 (   2.191 MiB)           0    -1
564a17e30000-564a18130000 (   3.000 MiB)           0    -1                   
564a18130000-564a183fb000 (   2.793 MiB)           0    -1
564a183fb000-564a186b7000 (   2.734 MiB)           0    -1
564a186b7000-564a188c6000 (   2.059 MiB)           0    -1
7f28b9c00000-7f28b9e43000 (   2.262 MiB)           0    -1
7f28b9e43000-7f28ba0f4000 (   2.691 MiB)           0    -1                   
7f28ba0f4000-7f28ba399000 (   2.645 MiB)           0    -1
7f28ba399000-7f28ba58a000 (   1.941 MiB)           0    -1
7f28ba58a000-7f28ba88c000 (   3.008 MiB)           0    -1                   
7f28ba88c000-7f28baa2b000 (   1.621 MiB)           0    -1
7ffe9a89c000-7ffe9a990000 ( 976.000 KiB)           0    -1
    ...

It's just for more info about my problem, but the problem comes from the fact that perf is executed for both perf record in start_monitoring_record and perf script in parse_damon_result but I manually modified the path for perf only for record as follows.

diff --git a/_damon_result.py b/_damon_result.py
index a52a1a9..d02a0a0 100644
--- a/_damon_result.py
+++ b/_damon_result.py
@@ -340,7 +340,7 @@ def start_monitoring_record(file_path, file_format, file_permission):
     except:
         return None, 'perf is not installed'
     pipe = subprocess.Popen(
-            ['perf', 'record', '-a', '-e', 'damon:damon_aggregated', '-o',
+            ['/home/honggyu/usr/bin/perf', 'record', '-a', '-e', 'damon:damon_aggregated', '-o',
                 file_path])
     record_requests[pipe] = [file_path, file_format, file_permission]
     return pipe, None

That causes running two different perf versions for record and script so it resulted the parsing error above. The better modification might be as follows.

diff --git a/_damon_result.py b/_damon_result.py
index a52a1a9..d02a0a0 100644
--- a/_damon_result.py
+++ b/_damon_result.py
@@ -205,7 +205,7 @@ def parse_damon_result(result_file):
         try:
             with open(os.devnull, 'w') as fnull:
                 script_output = subprocess.check_output(
-                        ['perf', 'script', '-i', result_file],
+                        ['/home/honggyu/usr/bin/perf', 'script', '-i', result_file],
                         stderr=fnull).decode()
         except:
             pass
@@ -340,7 +340,7 @@ def start_monitoring_record(file_path, file_format, file_permission):
     except:
         return None, 'perf is not installed'
     pipe = subprocess.Popen(
-            ['perf', 'record', '-a', '-e', 'damon:damon_aggregated', '-o',
+            ['/home/honggyu/usr/bin/perf', 'record', '-a', '-e', 'damon:damon_aggregated', '-o',
                 file_path])
     record_requests[pipe] = [file_path, file_format, file_permission]
     return pipe, None

I've confirmed that this modification works fine. But I've concluded to remove unhelpful /usr/bin/perf shell script, which is provided in Ubuntu, then install my custom perf binary instead.

I appreciate your patience and thanks for your help again.

sj-aws commented 1 year ago

Hi Honggyu, thank you for your investigation and sharing the results! I'm glad to hear that your given issue has resolved.

I think we can make it better with some options, though.

  1. Add an option for specifying the path to the real perf binary,
  2. Check the warning message from DAMO and handle it, or
  3. At least mention this problematic case in README

May I ask your opinion about those options?

honggyukim commented 1 year ago

Hi SeongJae, thanks for your suggestions. I also thought about it but wasn't sure if you wanted to handle this case.

I think the reasonable change would be to use a variable, maybe perf, instead of using 'perf' string directly, then provide a way to change the path with an option. The option could be something like --perf_path. Please let me know if you prefer any other option name.

If you're okay with this, I can prepare for a patch. Thanks.

sj-aws commented 1 year ago

Hi Honggyu, that makes sense! Looking forward to your patch!

honggyukim commented 1 year ago

Hi SeongJae,

I think the reasonable change would be to use a variable, maybe perf, instead of using 'perf' string directly

This part is really simply done.

then provide a way to change the path with an option. The option could be something like --perf_path.

I've looked into this, but passing args.perf_path by adding perf_path argument for all the way through parse_damon_result <- update_result_file <- stop_monitoring_record <- cleanup_exit in damo_record.py doesn't look good.

Otherwise, we might need a setter and getter to perf_path global variable to avoid argument passing, but this smells like a broken design.

Could you please let me know if you have any better idea? If not, I will just simply apply https://github.com/awslabs/damo/commit/b5b7488ee7e4a8712a253cf151b07861fdc1b1cb without introducing --perf_path option.

sj-aws commented 1 year ago

Hi Honggyu,

I think setting perf_path as a global variable inside _damon_result is not a problem, as the use of perf is aimed to be confined in the module. We already have a few such module-global variables.

honggyukim commented 1 year ago

Hi SeongJae, thanks for the confirmation. I've just made a PR at #39 so please have a look!