appsignal / probes-rs

Rust library to read out system stats from a machine running Unix
MIT License
15 stars 2 forks source link

Process IO probe #14

Closed thijsc closed 8 years ago

thijsc commented 9 years ago

io (in/out in bytes/ops)

thijsc commented 9 years ago

There was a comment here right? :-)

For our use case it's important that the probes don't need root permissions. It's not a design constraint to only use /proc, I'd say we use the most convenient route for every metric we want.

timonv commented 9 years ago

Ah yes, deleted it for a better one and forgot the latter. I'll look into it.

timonv commented 9 years ago

Ok, pidstat version. The integration test still fails even though I'm on stable now. Same issue as with other application, can't read stdout from child in tests.

So I've been playing a little with trying to pretify the code and develop some style. Some things I came up with:

Some stylistic points on the project itself:

Okthxbye, back to phoenix and elixir :-P

timonv commented 9 years ago

Appreciate to do a review first, then I'll clean it up

timonv commented 9 years ago

Hmm, another thing to play with is a ProbeError::Error<T> which just wraps any type of error. To be used for uncommon errors that don't need explicit handling (i.e. parse error on a byteslice that happens once in the app). Thoughts?

thijsc commented 9 years ago

I like the map_err and and_then based style. :+1: on using that instead of a lot of try's.

Responses to your other points:

What's the point of the os module? I think this works just as well, saves a lot of lines and removes a layer of indirection.

This allows us to put all the implementation code for a certain os in one clear separated scope. This will be handy when we add support for more os'es.

Also this crate will be a dependency for the agent which runs on multiple platforms. So it's vital that the code does compile on other platforms, even though it doesn't offer any functionality. The easiest way to do that is use an os mod so you can disable all functions with one attribute.

I think errors should have &'static str instead of String. This allows just using the const and I think is more readable. You can also use &str to match against a pattern.

I did that before, the disadvantage is that you can never have dynamic information in your error messages. See https://github.com/appsignal/probes/blob/master/src/memory.rs#L87 for an example.

Maybe there's a smarter solution for this possible?

If we want to opensource we should rustdoc :-)

Indeed

Hmm, another thing to play with is a ProbeError::Error which just wraps any type of error. To be used for uncommon errors that don't need explicit handling (i.e. parse error on a byteslice that happens once in the app). Thoughts?

I kind of like the enum based approach because it makes it very clear which types of errors are expected. Also you can use pattern matching to make sure you cover all cases. Of course at some number of expected types it becomes impractical.

I'd say we leave it as is and if it turns out we end up with more than 10ish types we take another look at it?

timonv commented 9 years ago

This allows us to put all the implementation code for a certain os in one clear separated scope. This will be handy when we add support for more os'es.

Also this crate will be a dependency for the agent which runs on multiple platforms. So it's vital that the code does compile on other platforms, even though it doesn't offer any functionality. The easiest way to do that is use an os mod so you can disable all functions with one attribute.

Makes sense!

Maybe there's a smarter solution for this possible?

Can't you do:

const ERROR_MESSAGE: &'static str = "I did a booboo at {}";

...
Error(&format(ERROR_MESSAGE, "this one time at bandcamp"))

I kind of like the enum based approach because it makes it very clear which types of errors are expected. Also you can use pattern matching to make sure you cover all cases. Of course at some number of expected types it becomes impractical.

I'd say we leave it as is and if it turns out we end up with more than 10ish types we take another look at it?

Love the enum approach and absolutely want to keep it. There are some errors (like with the map_err here) that are very spare / happen only at one point. So you could choose to just use the generic wrapper in the enum and not have to use the map_err to get the type chain correctly. I feel that both leave room for playing dirty/lazy, but the first is the better approach as you force preservation of the error. It doesn't feel right to add every little error to the enum, only the ones you want to give more context and/or are common to the application.

timonv commented 9 years ago

Also, pidstat gives different names than /proc/io. Should we rename the values to something more human?

thijsc commented 9 years ago

Can't you do:

That ends up having the same result as just inlining the string no?

Love the enum approach and absolutely want to keep it. There are some errors (like with the map_err here) that are very spare / happen only at one point. So you could choose to just use the generic wrapper in the enum and not have to use the map_err to get the type chain correctly. I feel that both leave room for playing dirty/lazy, but the first is the better approach as you force preservation of the error. It doesn't feel right to add every little error to the enum, only the ones you want to give more context and/or are common to the application.

Makes sense. Not against this if we find more of the less-important errors. I think at the moment we're not there yet.

thijsc commented 9 years ago

Also, pidstat gives different names than /proc/io. Should we rename the values to something more human?

:+1:

timonv commented 9 years ago

:+1:

Suggestions?

Updated the code. Integration test running pidstat still fails as the child's stdout still isn't captured in tests. Any ideas to fix that? I'd like to have this part under tests as well. I could use manual pipes instead (which I think would work as master Stdout is captured), but it's a lot more code :(

timonv commented 9 years ago

How about

read_kbs write_kbs canceled_kbs

iodelay isn't in proc/io btw

Block I/O delay of the task being monitored, measured in clock ticks. This metric includes the delays spent waiting for sync block I/O completion and for swapin block I/O completion.

Bit worried it's a (super small) blocking call. Seems instant though locally (with time)

thijsc commented 9 years ago

:+1: on the names.

thijsc commented 9 years ago

Where did you find that information about stdout capturing not working in test @timonv?

timonv commented 9 years ago

In the test suite. It's not working (at least for a child process)

timonv commented 9 years ago

I have the same problem with github.com/timonv/benv, but the binary works fine

timonv commented 9 years ago

Updated. Child process stdout still an issue

thijsc commented 9 years ago

A just noticed another small thing: The pid variables should be of type libc::pid_t. That's an alias to whichever integer type the OS happens to use for pids. This ensures maximum compatibility.

thijsc commented 9 years ago

On my default Ubuntu 14.04 installation pidstat is not installed:

The program 'pidstat' is currently not installed.  You can install it by typing:
sudo apt-get install sysstat

Is there no other way to get at this information?

timonv commented 9 years ago

Is there no other way to get at this information?

Hot damnit it used to be standard. I'll take a look.

timonv commented 9 years ago

All our 14.04 servers have pidstat. Maybe we should make sysstat a dependency? We can't read from file for non user processes, and I think that if we user other tools, it's just waiting for the first distro that doesn't use it.

thijsc commented 8 years ago

pidstat is missing on the Vagrant image we're using in this project. Is there no alternative that does not require root access?

timonv commented 8 years ago

It has to get the data from somewhere. Need to look into this. How far do we want to go in parsing hard to get linux data in favor of (marginal) performance?

timonv commented 8 years ago

Funny, had some possibly more helpful errors with 1.6 for getting that integration test to work. Will look at it tonight.

timonv commented 8 years ago

Ok, false positive. No child stdout is still an issue. Added early return and a link to the issue in the tests. I think we need to figure something out as most probes call a child process. Apart from that, I think I'm done here.

timonv commented 8 years ago

cc @thijsc

thijsc commented 8 years ago

We have a very similar test in another project:

    #[test]
    fn test_hostname() {
        let hostname = match Command::new("hostname").output() {
            Ok(p)  => {
                match from_utf8(&p.stdout) {
                    Ok(s)  => Some(s.trim().to_owned()),
                    Err(_) => None
                }
            },
            Err(e) => panic!("Getting hostname failed {}", e)
        }.unwrap();
        assert_eq!(Ok(hostname), super::hostname());
    }

Not sure why this would be failing here. It looks like the issue described in https://github.com/rust-lang/rust/issues/12309 is about reading stdout from the test process itself. Not stdout from Command?

timonv commented 8 years ago

Alright cleaned up. Integration test still failing. The issue mentions tests not capturing stdout of subprocesses.

thijsc commented 8 years ago

What's the latest status here? Is this one still stuck on the test issue?

timonv commented 8 years ago

Yep, it is

thijsc commented 8 years ago

@timonv Do you still want to finish this pull or are your priorities elsewhere at the moment? :-)

timonv commented 8 years ago

Elsewhere!

On Mon, 25 Apr 2016 at 16:50, Thijs Cadier notifications@github.com wrote:

@timonv https://github.com/timonv Do you still want to finish this pull or are your priorities elsewhere at the moment? :-)

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/appsignal/probes/pull/14#issuecomment-214371084

thijsc commented 8 years ago

In that case I will close this pull, we're not using this feature at the moment.