criblio / appscope

Gain observability into any Linux command or application with no code modification
https://appscope.dev
Apache License 2.0
266 stars 33 forks source link

tail -f breaks after implementing console events #728

Closed PrymalInstynct closed 2 years ago

PrymalInstynct commented 2 years ago

Steps To Reproduce

Followed article https://cribl.io/blog/bash-history-logging-the-easy-way-with-appscope/ implementing AppScope to enable logging of user commands.

I found that when using tail -f to watch the output of a log file I would get an error to stdout within about 5 seconds and tail would exit

tail: error waiting for inotify and output events: Interrupted system call

I also found the below error in the AppScope application log output

Scope: tail(pid:1950525): [2022-01-27 05:20:15.671] ERROR: Could not get function address of uv_fileno.

Environment

- AppScope: v0.8.1
- OS: Red Hat Enterprise Linux 8.5 (Ootpa)
- Architecture: x86_64
- Kernel: 4.18.0-348.2.1.el8_5.x86_64

Requested priority

Medium

Relevant log output

See above
iapaddler commented 2 years ago

Thanks for finding this. We will reproduce this and post updates here.

seanvaleo commented 2 years ago

I tried to reproduce this on 0.8.1 and latest master with the following steps:

Scope tail -f

docker run --rm -it cribl/scope:<VERSION_ID>
scope run -- tail -f /var/log/dpkg.log
# In another terminal:
docker exec -it <above container> /bin/bash
apt update && apt install -y python3 # to generate dpkg logs
# Wait 20 seconds to observe failures

Scope bash

docker run --rm -it cribl/scope:<VERSION_ID>
scope bash
tail -f /var/log/dpkg.log
# In another terminal:
docker exec -it <above container> /bin/bash
apt update && apt install -y python3 # to generate dpkg logs
# Wait 20 seconds to observe failures

Scope bash with cribl tls destination (per blog post):

docker run --rm -it cribl/scope:<VERSION_ID>
scope run -c tls://logstream.strange-cannon.cribl.cloud:10090 -- bash
tail -f /var/log/dpkg.log
# In another terminal:
docker exec -it <above container> /bin/bash
apt update && apt install -y python3 # to generate dpkg logs
# Wait 20 seconds to observe failures

I could not reproduce the issue.

Regarding this

I also found the below error in the AppScope application log output
Scope: tail(pid:1950525): [2022-01-27 05:20:15.671] ERROR: Could not get function address of uv_fileno.

This is known. We changed the level of this log recently but it will still appear with a more verbose log level set.

seanvaleo commented 2 years ago

Closing for now. Please provide further information if you want us to re-open and investigate - we will be happy to.

PrymalInstynct commented 2 years ago

This is still an issue, that I am experiencing on a fresh install of RHEL 8.5 (Not a container). Below are the configs I am using.

/etc/profile.d/appscope.sh

SCOPE_HOME=/opt/scope
LD_PRELOAD=/opt/scope/libscope.so
PATH=$PATH:/opt/scope
SCOPE_CRIBL_NO_BREAKER=true

export SCOPE_HOME LD_PRELOAD PATH SCOPE_CRIBL_NO_BREAKER

/etc/scope/scope.yml

metric:
  enable: false
event:
  enable: true
  format:
    type: ndjson
    maxeventpersec: 10000
    enhancefs: true
  watch:
    - type: console
      name: (stdout)|(stderr) # matches the output stream
      value: .*               # matches data written
  transport:
    type: file
    path: '/var/log/appscope/events.log'
    buffer: line
libscope:
  configevent: true
  summaryperiod : 10
  commanddir : '/tmp'
  log:
    level: warning
    transport:
      type: file
      path: '/var/log/appscope/appscope.log'
      buffer: line
cribl:
  enable: false
tags:
  user: $USER
  cwd: $PWD
protocol:
custom:
iapaddler commented 2 years ago

what is the command line you are using with the above config?

PrymalInstynct commented 2 years ago

Since appscope process is spawned as a part of the bash session starting I am just running tail -f /var/log/

As Root

[root@smoke log]# tail -f /var/log/messages
Feb  3 14:32:29 smoke systemd[2347017]: Starting D-Bus User Message Bus Socket.
Feb  3 14:32:29 smoke systemd[2347017]: Reached target Timers.
Feb  3 14:32:29 smoke systemd[2347017]: Reached target Paths.
Feb  3 14:32:29 smoke systemd[2347017]: Listening on D-Bus User Message Bus Socket.
Feb  3 14:32:29 smoke systemd[2347017]: Reached target Sockets.
Feb  3 14:32:29 smoke systemd[2347017]: Reached target Basic System.
Feb  3 14:32:29 smoke systemd[2347017]: Reached target Default.
Feb  3 14:32:29 smoke systemd[2347017]: Startup finished in 188ms.
Feb  3 14:32:29 smoke systemd[1]: Started User Manager for UID 1000.
Feb  3 14:32:29 smoke systemd[1]: Started Session 53 of user prymalinstynct.
tail: error waiting for inotify and output events: Interrupted system call

As a user

[prymalinstynct@smoke log]$ tail -f dnf.log
2022-02-03T13:03:08-0700 DEBUG DNF version: 4.7.0
2022-02-03T13:03:08-0700 DDEBUG Command: dnf makecache --timer
2022-02-03T13:03:08-0700 DDEBUG Installroot: /
2022-02-03T13:03:08-0700 DDEBUG Releasever: 8
2022-02-03T13:03:08-0700 DEBUG cachedir: /var/cache/dnf
2022-02-03T13:03:08-0700 DDEBUG Base command: makecache
2022-02-03T13:03:08-0700 DDEBUG Extra commands: ['makecache', '--timer']
2022-02-03T13:03:08-0700 DEBUG Making cache files for all metadata files.
2022-02-03T13:03:08-0700 INFO Metadata cache refreshed recently.
2022-02-03T13:03:08-0700 DDEBUG Cleaning up.
tail: error waiting for inotify and output events: Interrupted system call
iapaddler commented 2 years ago

hey. we sort of recreated what you are seeing. we could not repro directly. we created a RHEL 8.4 EC2 VM. taking the same steps you defined we did not see the issue. however, theorizing about the cause and the staring at the error message, it seems as though the timer we use should be stopped and is not getting stopped in your case. so, if we force the timer not to stop in test code we see the same error message from tail that you are seeing.

we know what is happening. we have a theory as to why it's happening. given that we are not able to repro directly, would you be willing to run a test build that we provide? it should allow us to verify that we have this fixed.

PrymalInstynct commented 2 years ago

Thanks for looking into it further. I am happy to help however I can.

iapaddler commented 2 years ago

Will you please try this test version? It's built f4rom the draft PR 757.

$ curl -O https://cdn.cribl.io/dl/scope/bug/728-tail-f/linux/scope
$ chmod +x scope
$ ./scope version
Version: d1cf379
Build Date: 2022-02-07T16:21:47Z
PrymalInstynct commented 2 years ago

Sorry to report this still appears to be an issue

Download test build

prymalinstynct@wks01-windows:~/Projects/ (main)$ ssh 10.10.1.10

Last login: Tue Feb  8 06:09:23 2022 from 10.10.100.10
[prymalinstynct@smoke scope]$ cd /opt/scope/
[prymalinstynct@smoke scope]$ sudo curl -O https://cdn.cribl.io/dl/scope/bug/728-tail-f/linux/scope
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.6M  100 27.6M    0     0  3936k      0  0:00:07  0:00:07 --:--:-- 4017k
[prymalinstynct@smoke scope]$ sudo chmod +x scope
[prymalinstynct@smoke scope]$ ./scope version
Version: d1cf379
Build Date: 2022-02-07T16:21:47Z

Logout and log back in to reset bash session

[prymalinstynct@smoke scope]$ exit
logout
Connection to 10.10.1.10 closed.
prymalinstynct@wks01-windows:~/Projects/ (main)$ ssh 10.10.1.10

Last login: Tue Feb  8 06:11:49 2022 from 10.10.100.10

Run Tail command

[prymalinstynct@smoke ~]$ tail -f /var/log/dnf.log
2022-02-08T05:29:54-0700 DEBUG reviving: 'rhel-8-for-x86_64-appstream-rpms' can be revived - repomd matches.
2022-02-08T05:29:55-0700 DEBUG rhel-8-for-x86_64-appstream-rpms: using metadata from Mon 07 Feb 2022 07:14:16 AM MST.
2022-02-08T05:29:55-0700 DEBUG reviving: 'rpmfusion-free-updates' can be revived - repomd matches.
2022-02-08T05:29:55-0700 DEBUG rpmfusion-free-updates: using metadata from Sat 05 Feb 2022 10:51:10 AM MST.
2022-02-08T05:29:55-0700 DEBUG reviving: 'rpmfusion-nonfree-updates' can be revived - repomd matches.
2022-02-08T05:29:55-0700 DEBUG rpmfusion-nonfree-updates: using metadata from Sat 05 Feb 2022 10:57:56 AM MST.
2022-02-08T05:29:55-0700 DEBUG User-Agent: constructed: 'libdnf (Red Hat Enterprise Linux 8.5; generic; Linux.x86_64)'
2022-02-08T05:29:58-0700 DDEBUG timer: sack setup: 6152 ms
2022-02-08T05:29:58-0700 INFO Metadata cache created.
2022-02-08T05:29:58-0700 DDEBUG Cleaning up.
tail: error waiting for inotify and output events: Interrupted system call
iapaddler commented 2 years ago

Ugh! I think this may be my bad. In the instructions I provided, I failed to suggest that you extract the content of the scope you downloaded. Assuming that you did exactly what you posted, you were using the old libscope and no change would be seen. Which, of course, is what appears to have happened.

The steps become:

$ curl -O https://cdn.cribl.io/dl/scope/bug/728-tail-f/linux/scope
$ chmod +x scope
$ ./scope version
Version: d1cf379
Build Date: 2022-02-07T16:21:47Z
$ ./scope extract /opt/scope

Using /opt/scope in the extract command because in /etc/profile.d/appscope.sh you are using LD_PRELOAD=/opt/scope/libscope.so. The extract command will place the updated libscope in the dir you are referencing.

Note that the extract command will place a scope.ymlin /opt/scope. Given that SCOPE_HOME=/opt/scope, libscope will use the scope.yml that is extracted into /opt/scope. Assuming that the intent is to use the scope.yml in /etc/scope, you will want to remove or rename /opt/scope/scope.yml.

Thanks for your patience. Will you give this a try...again?

PrymalInstynct commented 2 years ago

ok, I ran ./scope extract /opt/scope and am still getting the same error.

here is the output of the appscope.log file I am writing to which errored out the same when i used tail -f

[root@smoke appscope]# tail -f /var/log/appscope/appscope.log 
Scope: id(pid:2589593): [2022-02-08 06:13:21.832] ERROR: Could not get function address of uv_fileno.
Scope: cat(pid:2589595): [2022-02-08 06:13:21.842] ERROR: Could not get function address of uv_fileno.
Scope: tail(pid:2589605): [2022-02-08 06:13:33.214] ERROR: Could not get function address of uv_fileno.
Scope: ls(pid:2636249): [2022-02-09 04:19:57.765] ERROR: Could not get function address of uv_fileno.
Scope: mv(pid:2636283): [2022-02-09 04:20:44.052] ERROR: Could not get function address of uv_fileno.
tail: error waiting for inotify and output events: Interrupted system call
[root@smoke appscope]# 
michalbiesek commented 2 years ago

@PrymalInstynct Thanks for your message. What confuses me is that you are still able to see in the /var/log/appscope/appscope.log the following message (assuming it comes after you perform a scope extract operation):

Scope: mv(pid:2636283): [2022-02-09 04:20:44.052] ERROR: Could not get function address of uv_fileno.

This type of message should not appear in the case of libscope.so library version (mentioned in https://github.com/criblio/appscope/issues/728#issuecomment-1032871183), which comes from scope extract operation.

In the case of using libscope.so library coming from scope extract after executing the 'strings' command against libscope.so:

strings libscope.so | grep "Scope Version"

You should see

Constructor (Scope Version: d1cf379bcf1a)

Can you verify the command above with /opt/scope/libscope.so and possibly check if that is the only libscope.so you have it?

PrymalInstynct commented 2 years ago

Here is the output of the 2 commands I ran to find all possible instances of libscope.so and then pull the version out of the strings

[root@smoke ~]# find / -name 'libscope.so'
/opt/scope/libscope.so
[root@smoke ~]# strings /opt/scope/libscope.so | grep "Scope Version"
Constructor (Scope Version: d1cf379bcf1a)
Scope Version: %s   Dump From: %s
[root@smoke ~]# 

For the sake of sharing in case it matters in some way, here is how I am installing and configuring appscope. AppScope Ansible Role

As a side note I would love to see the version included in the package file name so I could dynamically pull the appscope package for github by version number a little easier.

michalbiesek commented 2 years ago

@PrymalInstynct Thanks for your message and valuable input.

The libscope.so version looks OK.

Thanks for the link AppScope Ansible Role - just FYI here is branch where the tgz file for the Appscope version (Scope Version: d1cf379bcf1a) is placed in /files/ path.

We will investigate the issue further.

Just to ensure:

PrymalInstynct commented 2 years ago

Yes, I am using the ansible role linked to install appscope in my environment, and yes I get the same behavior as both root and a standard user.

iapaddler commented 2 years ago

we have found issues with the Ansible play book. however, nothing that would explain the receipt of a signal by tail. can we discuss this synchronously? would you be willing to join our Slack channel? it would be easier to discuss and screen share as needed.

https://cribl-community.slack.com/ channel #appscope

PrymalInstynct commented 2 years ago

I'll join up on Monday.

iapaddler commented 2 years ago

That's great. We are looking forward to it.

On Fri, Feb 11, 2022 at 7:06 AM PrymalInstynct @.***> wrote:

I'll join up on Monday.

— Reply to this email directly, view it on GitHub https://github.com/criblio/appscope/issues/728#issuecomment-1036195786, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHIMXB5ZEXNTAHXTRGDFELU2UCV7ANCNFSM5M5YKKJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were assigned.Message ID: @.***>

PrymalInstynct commented 2 years ago

@iapaddler the link is not letting me sign-in it is telling me I need a cribl.io address to join the workspace.

iapaddler commented 2 years ago

the community channel should not require a specific email address. send me an email address and i'll send a join email. that should resolve this.

iapaddler commented 2 years ago

@PrymalInstynct better approach, the AppScope web site has a link to the community channel: https://appscope.dev/

If you can sign up there, we can explore this in detail. Let us know if this works for you.

PrymalInstynct commented 2 years ago

Yeah, that works. I'll should be able to jump on tomorrow morning.

jrcheli commented 2 years ago

The root cause of this is that we're missing a function that we need to interpose. Specifically __poll_chk. My understanding is that this function is used by tail due to it being compiled with a gcc toolchain with the _FORTIFY_SOURCE=2 option. We had problems reproducing it because the versions of tail that we tried don't use the __poll_chk symbol. On my machine, for example, tail doesn't use this symbol. Heck, it doesn't even use poll:

ubuntu@ip-10-8-107-159:~/jrc/appscope3$ which tail
/usr/bin/tail
ubuntu@ip-10-8-107-159:~/jrc/appscope3$ tail --version
tail (GNU coreutils) 8.28
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Ian Lance Taylor,
and Jim Meyering.
ubuntu@ip-10-8-107-159:~/jrc/appscope3$ nm -D /usr/bin/tail | egrep "(poll)|(select)"
                 U select

I've done some analysis of what we functions need to add to ensure that this doesn't happen in other scenarios. This command finds glibc functions that are provided that end in "_chk".

nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep _chk\$ | grep " T " | gawk '{print $3}' | cut -d_ -f3 > glibc_func_list.txt

This command finds functions that libscope.so provides

nm -D ./lib/linux/x86_64/libscope.so | grep " T " | gawk '{print $3}' > appscope.txt

This command compares the two files we just created above:

for func in $(cat glibc_func_list.txt); do grep "^$func\$" appscope.txt; done | sort | uniq > candidates.txt

The contents of candidates.txt suggests that we need ensure that we have a "_chk version" of the following functions to libscope.so:

fgets
fgetws
fread
poll
ppoll
pread
pread64
read
recv
recvfrom

This then tells us which candidates of concern we already have in libscope.so:

for func in $(cat candidates.txt); do grep "__${func}_chk" appscope.txt; done

Which returned that we already have _chk functions for these:

__fgets_chk
__fgetws_chk
__fread_chk
__pread_chk

Which means by omission that these are the _chk functions that we need to add to libscope.so: poll ppoll pread64 read recv recvfrom

seanvaleo commented 2 years ago

I verified that we are interposing these new functions. In gdb you can see that we are hitting breakpoints in our wrap.c.

Thread 2.1 "a.out" hit Breakpoint 1, __ppoll_chk (fds=0x0, nfds=1, tmo_p=0x1, sigmask=0x0, fdslen=1) at src/wrap.c:1907
Thread 2.1 "a.out" hit Breakpoint 1, __poll_chk (fds=0x0, nfds=1, timeout=1, fdslen=1) at src/wrap.c:1854
Thread 2.1 "a.out" hit Breakpoint 1, __recvfrom_chk (sockfd=3, buf=0x555555558040 <buf>, len=512, buflen=512, flags=-1, src_addr=0x55555556f2b0, addrlen=0x555555555270 <__libc_csu_init>) at src/wrap.c:4796
Thread 2.1 "a.out" hit Breakpoint 1, __recv_chk (sockfd=3, buf=0x555555558040 <buf>, len=512, buflen=512, flags=-1) at src/wrap.c:4741
Thread 2.1 "a.out" hit Breakpoint 2, __pread64_chk (fd=3, buf=0x555555558040 <buf>, count=512, offset=512, bufsize=10) at src/wrap.c:2133
Thread 2.1 "a.out" hit Breakpoint 1, __read_chk (fd=3, buf=0x555555558040 <buf>, nbytes=512, buflen=512) at src/wrap.c:2200

I compiled a test app that called these _chk functions directly, with gcc -g -D_FORTIFY_SOURCE=2 -O2 my_app.c

myapp.c:

#include <string.h>
#include <unistd.h>
#include <fcntl.h>

int main() {
    __read_chk(fd, NULL, 1, 1, 1);
//    __pread64_chk(fd, buf, 1, 1);
//    __poll_chk(NULL, 1, 1, 1);
//    __ppoll_chk(NULL, 1, NULL, NULL, 1);
//    __recv_chk(1, NULL, 1, 1, -1);
//    __recvfrom_chk(1, NULL, 1, 1, -1, NULL, NULL);

    return 0;
}