Closed EricCousineau-TRI closed 2 years ago
Using gdbserver
to debug (hard to see stacktrace when curses
and segfaults are involved ;), I get the following stacktrace:
Program received signal SIGFPE, Arithmetic exception.
0x000055555560c810 in compute_sizes_from_layout (devices_count=8, device_header_rows=3, device_header_cols=78, rows=26, cols=189, to_draw=0x603000000280, process_displayed=1951,
device_positions=0x7fffffffdc00, num_plots=0x611000000200, plot_positions=0x7fffffffddd0, map_device_to_plot=0x7fffffffdb60, process_position=0x7fffffffdd90,
setup_position=0x7fffffffddb0) at /home/ubuntu/nvtop/src/interface_layout_selection.c:380
380 (max_plot_cols - cols_needed_box_drawing) % num_info_per_plot[j];
(gdb) bt
#0 0x000055555560c810 in compute_sizes_from_layout (devices_count=8, device_header_rows=3, device_header_cols=78, rows=26, cols=189, to_draw=0x603000000280, process_displayed=1951,
device_positions=0x7fffffffdc00, num_plots=0x611000000200, plot_positions=0x7fffffffddd0, map_device_to_plot=0x7fffffffdb60, process_position=0x7fffffffdd90,
setup_position=0x7fffffffddb0) at /home/ubuntu/nvtop/src/interface_layout_selection.c:380
#1 0x00005555555ef505 in initialize_all_windows (dwin=0x611000000180) at /home/ubuntu/nvtop/src/interface.c:295
#2 0x0000555555606e85 in update_window_size_to_terminal_size (inter=0x611000000180) at /home/ubuntu/nvtop/src/interface.c:1805
#3 0x00005555555ead25 in main (argc=1, argv=0x7fffffffe468) at /home/ubuntu/nvtop/src/nvtop.c:327
Tried some naive fixes, but got some other errors. Help would be appreciated :sweat_smile:
Hello @EricCousineau-TRI,
Thanks for the bug report.
I will create proper unit tests. This will make our lives easier while debugging this issue. ncurses is wonderful when you don't have to debug :wink:
Could you test the branch fix_147 please?
Works wonderfully! Tested with view size per above, using a1bdc96 and -DBUILD_TESTING=OFF
.
Confirmed that old master
build still segfaults.
Great, thanks for your help. I'll do some more testing and merge it shortly.
Merged into master. I will do a minor release with all that has been fixed. I am just waiting for some feedback/bugs that usually manifest within a week or two.
I am using current
master
(09bead48), but will get "Floating point exception (core dumped)" in my terminal for certain terminal sizes.Environment
I am using Ubuntu 20.04 on an AWS EC2 instance (p3.16xlarge), which has 8 GPUs. I build + install with following commands:
I run this over an SSH session, using
tmux
(3.0a-2ubuuntu0.3).Reproduction
For certain terminal sizes, seemingly only on the EC2 instance, I get a segfault. For example, using (cols x lines):
On my local machine (2 GPUs), I cannot reproduce this error w/ the same screen size.
Extra
Example of measuring terminal size: https://stackoverflow.com/questions/263890/how-do-i-find-the-width-height-of-a-terminal-window
echo "$(tput cols) x $(tput lines)"