Ongy / netlink-hs

Netlink communication for Haskell
BSD 3-Clause "New" or "Revised" License
2 stars 7 forks source link

too few bytes ... From: demandInput #5

Closed teto closed 5 years ago

teto commented 5 years ago

I use this library to communicate with a non upstream linux kernel module.

At some point I've seen this kind of error

An error in parsing happeneduser error (too few bytes
From:   demandInput

)

which after some tracing and IRC help we tracked down to myPack <- trace "recvOne" (recvOne simpleSock), i.e., https://github.com/Ongy/netlink-hs/blob/master/System/Linux/Netlink.hs#L368

I noticed it happens when I send some netlink request that calls this kernel function.

mptcp_nl_genl_conn_exists(struct sk_buff *skb, struct genl_info *info)
{
    struct sock *meta_sk;
    u32     token;

    if (!info->attrs[MPTCP_ATTR_TOKEN])
        return -EINVAL;

    token = nla_get_u32(info->attrs[MPTCP_ATTR_TOKEN]);

    meta_sk = mptcp_hash_find(genl_info_net(info), token);
    if (!meta_sk)
        return -ENOTCONN;

    sock_put(meta_sk);
    return 0;
}

I believe 0/ENOTCONN/EINVAL returns a short message that triggers the error.

Ongy commented 5 years ago

Most (all?) of this library asumes the kernel is well behaved and trusts kernel input to some degree. If we can't trust the kernel, we have more issues^^

'Non upstream' makes me wonder who's at fault here. mptcp? I guess that's unter scrutiny, I've heard of it. I'll have to have a look myself, I've been a bit away from my projects and Haskell for a while, so I can't tell on top of my head.

teto commented 5 years ago

Thanks for the answer. The code I am referring to is at https://github.com/multipath-tcp/mptcp/blob/mptcp_trunk/net/mptcp/mptcp_netlink.c. I might be wrong but I believe my netlink call to mptcp_nl_genl_conn_exists triggers the bug since when I don't call it I have no error. I am not sure how netlink works but I believe the different return values of mptcp_nl_genl_conn_exists are converted into netlink messages that netlink-hs should handle but might not ? There shoud be a way to trigger the same errors with other upstream modules.

Ongy commented 5 years ago

Can you provide a (minimal) working example of how to provoke the error? I don't mind installing mptcp here to get it to run, but if it requires some setup as well, I'll need some guidance.

I have an idea what's going wrong, but unless I still need to verify it.

teto commented 5 years ago

I would feel bad making you install a full kernel for that, also because the setup is complex. Maybe you can send me a patch or share your idea that i could check locally. Or i could write a simple module that triggers it. If you insist on trying you should install this kernel https://github.com/teto/mptcp/tree/trunk_v8?files=1 with a kernel config as described at http://multipath-tcp.org/pmwiki.php/Users/DoItYourself. You can find my haskell daemon at https://github.com/teto/netlink_pm/tree/v5?files=1. After setting the mptcp path manager ro netlink via a sysctl and starting my daemon i start a local iperf seszion and the crash occurs

teto commented 5 years ago

If you use nixos i have a module that can setup most of it.

Ongy commented 5 years ago

I can set up nix in a vm to test it, yea. Setting it up manually seems a bit troublesome :+1: For the time being, I added recvOne' that makes the error recoverable and should have been the default from the get go.

Ongy commented 5 years ago

I added query' as well, to have the high level function recoverable as well.

teto commented 5 years ago

Are you familiar with nix ? as we deal with kernels, it will require nixos. Is there anyway I can capture (from haskell) the packet that triggers the issue and send it to you ? I know how to capture netlink packets from wireshark but I am not sure I could identify the faulty one.

Ongy commented 5 years ago

netlink capture sounds fine to me. I know what I'm looking for.

Eh, I tried nixOS before a couple of times and gave up, because it didn't quite fit my expectations.

teto commented 5 years ago

I beleive the error should appear in the following .pcapng (rename it to netlink.pcapng and open it with wireshark): netlink_con.txt

NixOS is really awesome but it's hard to advise unless people are really ready to invest time to learn nix.

teto commented 5 years ago

or rather how could I modify https://github.com/teto/netlink_pm/blob/48342b4c674d9492a78a9add324782e1d3217970/hs/daemon.hs#L474 to capture the netlink packet content upon the exception ? (with your latest changes, it should be possible right ?)

teto commented 5 years ago

I updated my code https://github.com/teto/netlink_pm/blob/v5/hs/daemon.hs and I have the same error:

MPTCP_CMD_EXIST = 15
daemon.hs: user error (Too short input for last message =.=)

I am still a noob in haskell but it seems to me that https://github.com/teto/netlink_pm/blob/v5/hs/daemon.hs#L502 assumes getting valid MPTCP packets but it won't catch smaller packets like ErrorMsg or DoneMsg. How can I get all packets and then reroute them depending on their type ?

teto commented 5 years ago

I managed to find an easily reproducible error: Try launching the minimal example https://github.com/teto/netlink_pm/blob/v5/hs/test.hs on a computer without the "mptcp" family. The call is supposed to be safe since I use getFamilyIdS and I am just trying to query the different netlink families, yet it fails with

sudo runghc test.hs
making socket
test.hs: user error (Too short input for last message =.=)

Here I join some comments from a #haskell help channel before I came up with the previous minimal example:

23:49 <Cale> have a look there
23:50 <Cale> There are three possible results of parsing: the parse can fail completely with Fail, or succeed with Done, but it can also produce Partial f
23:50 <Cale> where f is a function ByteString -> Result r
23:50 <Cale> that you can supply more input to in order to try to continue parsing
23:51 <Cale> The error you're getting suggests that it's hitting the Partial case
23:52 <Cale> There are two possibilities when you get a Partial: you didn't supply enough input, or the parser can't tell for certain that you must be at the end at the end of input
23:52 <Cale> dminuoso: Well, it mostly existed in response to binary lacking some features, which it later got.
23:57 <Cale> teto: Do you suppose it's possible that a packet is longer than 4096 bytes?
23:57 <teto> Cale: thanks for the help will try to see what I can come up with. I suppose it won't be fun to develop on nix 2 packages at the same time. 
23:57 <teto> Cale: I think so, I will try to bump the bufferSize constant
23:58 <Cale> teto: One thing you can try is just rewriting recvOne' for yourself
23:58 <Cale> and getPackets'
23:58 <dminuoso> Cale: Ah you mean by manually using pushChunk?
23:58 <Cale> dminuoso: Well, yeah, it has a funny interface for supplying the input, but those are all strict bytestrings there
23:59 <dminuoso> Cale: Yeah I saw it in the source. This is interesting, I refactored some of my packages to use cereal because of this issue.
00:00 <teto> Cale: that's an idea but I am not sure how I would change than, I mean even if the parsing returns a Partial, I am not sure How I would wait for more input to retrigger the call
00:01 <Cale> hmm, I'm actually not sure either about the details of how recvmsg is meant to work in cases like this
00:02 <Cale> But I also just suspect that all that needs to happen is to apply the function it gives back to an empty bytestring so as to signal the end of input, and see if *that* is Fail or Done
00:03 <Cale> But yeah, if increasing the buffer size helps, then the correct solution likely involves making another call to recvmsg somehow
00:05 <Cale> oh
00:06 <Cale> Is this a datagram or stream protocol?
00:10 <Cale> https://linux.die.net/man/3/recvmsg seems to indicate that for a message-based socket, the entire message must be read in a single operation, and if the buffer isn't large enough, it's just truncated and the remainder discarded
00:10 <Cale> So that kinda sucks
00:11 <Cale> For a stream-based socket, you can just call recvmsg again though.
00:15 <Cale> But I'm thinking that's not really the problem and that all that really needs to happen is something like  Partial f -> case f BS.empty of Fail msg _ -> Left msg : []; Done r bs -> Right r : getPackets' bs; Partial _ -> [Left "Message was truncated"]
00:15 <Cale> i.e. just do the same stuff in those cases, except now if you still get Partial, it's really an error
00:17 <teto> Cale: ok will try that . Currently updating (haskell-ide-engine) hie-nix 
00:20 <Cale> You could also switch it to using  runGetChunk getGenPacket (Just 0) bytes
00:21 <Cale> which tells the parser explicitly that there are 0 bytes remaining, so hopefully it won't give a Partial in that case :P
00:21 <Cale> oh wait, no
00:21 <Cale> hmm
00:21 <Cale> oh, just runGet :D
00:22 <Cale> runGet getGenPacket bytes
00:22 <Cale> and then it'll give you an Either String a
00:23 <Cale> though... it's really strange
00:23 <Cale> this thing
00:23 <Cale> it gets 4096 bytes
00:23 <Cale> or at most that many bytes anyway
00:23 <Cale> and then tries to read packets from it
00:24 <Cale> and it keeps passing the remaining bytes onward to the parser to consume some more
00:24 <Cale> so it looks like it's expecting to read multiple packets
00:25 <Cale> but realistically, if it's reading many packets, it's not so surprising that it'll eventually get cut off?
00:25 <Cale> when you run it, is the error happening right on the first packet, or somewhere farther in?
00:26 <Cale> This seems like it's got to be wrong to me
00:26 <Cale> one way or another
00:27 <Cale> Like, the only way that this won't fail is if the data returned by recvmsg can cleanly be cut into packets
00:29 <Cale> But it seems pretty much guaranteed that if your buffer was over 4096 bytes and you're not incredibly lucky, it's going to chop a packet in half
00:29 <Cale> and either that just loses data or it's harmless, depending on what sort of socket this is
00:30 <Cale> but provided recvmsg doesn't sabotage you, to continue onward from the packet that's chopped in half, you'd want to call recvmsg again, and pass the next string you get to the function that got returned in the Partial constructor
00:32 <teto> it used to work ok until a certain point. Then I upgraded with the maintainer's change and now it fails pretty much straightaway (it can subscribe to the multicast neltink family but then on notification it fails)
00:32 <Cale> The name "recvOne" makes it sound like it's supposed to get one packet though
00:32 <Cale> which it... doesn't look like it's attempting to do
Ongy commented 5 years ago

Cale isn't wrong in general, but wrong in the specific case (afaik). Netlink sockets do return datagrams, which are always exactly one packet.

And yes, recvOne is supposed to receive a single packet, but got redesigned to receive multiple, due to some specifics. Honestly, looking at some of my decisions when I took over this library, I want to turn back time :/

teto commented 5 years ago

thanks for your comment. Concerning my minimal example https://github.com/teto/netlink_pm/blob/v5/hs/test.hs what do you think is the cause (it should run on any kernel, no specific setup) ?

I am trying to add some logging to the library via fast-logger (both to learn and for easier debug), dunno if that can be interesting. Also I think it could be useful to export the default bufferSize.

Ongy commented 5 years ago

That confirms a suspicion I had. I wanted to take care of this in the morning but had less time than I expected

I should have a test version ready later today (GMT+1), I guess in 7-8 hours

Ongy commented 5 years ago

97a2bf5 testing time :)

teto commented 5 years ago

I haven't tested advanced scenarios but seems to work ok for connection. Thanks a lot ! I will do some more testing in 2 weeks (on holiday), I'll close the issue till then xD