F-Stack / f-stack

F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.
http://www.f-stack.org
Other
3.87k stars 898 forks source link

how to guarantee packets sending to specific process #418

Open kwjjyn opened 5 years ago

kwjjyn commented 5 years ago

hi , f-stack team

here I have ported an application to f-stack. It is : 1 single process with one thread ( thread is to collect data and no ff_api in it ) 2 it’s a client app which send tcp connection to a nginx server.

And if I run this app on one cpu core , there is no problem. the command is below:

./myapp  --tcp=192.168.0.112:10000 --conf=../../config.ini --proc-type=primary --proc-id=0 

and the lcore_mask in file config.ini is set to 1 :

[dpdk]
# Hexadecimal bitmask of cores to run on.
lcore_mask=1

However, when I want to run my app with multi process on multi cores , there’s some error. I want run first myapp on core 0 and second myapp on core 1. The command is :

./myapp --tcp=192.168.0.112:10000 --conf=../../config.ini --proc-type=primary --proc-id=0 
./myapp --tcp=192.168.0.112:10000 --conf=../../config.ini --proc-type=secondary --proc-id=1

And the file config.ini :

[dpdk]
# Hexadecimal bitmask of cores to run on.
lcore_mask=3

And both of these two apps run and us is up to 100% using top command. But I found that some tcp connections were time out. And there were about half of these tcp connections failed because of timeout . Half of tcp connections were successful. Then I captured packets on the server . I found that after server sends syn+ack packets to client , Myapp just can receive about half syn+ack packets and then send ack to server to establish the tcp connection . But the other syn+ack packets sent from server could not be received by client . So myapp would always send first syn packet to server and the connection couldn’t be established .

AFAK, if a server app ported on F-stack , the RSS will hash the five tuple from client to process. And then these packets from same tcp flow would be always sent to same process . But if a client app ported on F-stack , client would send packets out to server to establish tcp connections. However , RSS maybe not hash the packets from server to the same process. For example , process 1 on core 1 ,process 2 on core 2 ,both on f-stack as clients :

  1. process 1 send syn to server ,
  2. server send syn+ack to client.
  3. RSS hash syn+ack packet to process 2 . But process 1 and 2 is different process and has different stack ,share nothing . So process 2 would do nothing about this packet.
  4. process 1 can’t receive sys+ack and then send syn again and agiain . The tcp connection will not be established forever.

I don’t know if my understanding is right and I guess this is the reason. So how to guarantee the packet from server would be sent to a specific process . I think maybe RSS couldn’t do this mission. Is there another way ? How the nginx proxy realized ?

I would appreciate it if you could offer any help.

kwjjyn commented 5 years ago

it seems that f-stack will run ff_rss_check when pick up a port to connect to remote server . But I don't see when this function will be called . Is it called after ff_connect() ?

More info : my nic is virtio driver. will ff_rss_check run correctly under this situation ? I really don't know there would be another reason that tcp connection fails

kwjjyn commented 5 years ago

Actually , I found there is an another ff_api called ff_regist_packet_dispatcher. What I thought is : if process 1 on core 1, process 2 on core 2 , I just bind the src_port before ff_connect ,maybe the port range is 10000-20000 in core 1 , port range is 20000-30000 in core 2 . And then write the dispatch callback:

if dstport in 10000-20000:
      sendto core1 
if dstport in 20000-30000:
      sendto core2

to realize a function like flow director. do you think it's feasible ? If so , how to use this function . And the queue id seems a little difficult to get . And how to specify the parameter of ff_regist_packet_dispatcher ? Do you have any examples ? do i need to put ff_regist_packet_dispatcher function before ff_run() to init ?

Another solution is to check the result of ff_rss_check . But it seems to change the code is a little difficult .

Could you give me some advice ? Thanks so much.

whl739 commented 5 years ago

See https://github.com/F-Stack/f-stack/issues/231#issuecomment-396201683.

kwjjyn commented 5 years ago

Thanks for your reply. yes,I have seen this issue before. However, I still don’t know how to use this function in C. Could you give me some example to use it ? Like how to use it in /example/main_epoll.c .

Thanks very much!

jfb8856606 commented 5 years ago

like below:

int pipeline_dispatch_cb (void *data, uint16_t *len,
    uint16_t queue_id, uint16_t nb_queues)
{
    struct ipv4_hdr *iph;
    int iph_len;
    uint32_t hash;

    iph = (struct ipv4_hdr *)(data + ETHER_HDR_LEN);
    iph_len = (iph->version_ihl & 0x0f) << 2;

    if (iph->next_proto_id != IPPROTO_IP) {
        return queue_id;
    }

    iph = (struct ipv4_hdr *)((char *)iph + iph_len);
    iph_len = (iph->version_ihl & 0x0f) << 2;

    if (iph->next_proto_id == IPPROTO_TCP) {
        struct tcp_hdr *tcph = (struct tcp_hdr *)((char *)iph + iph_len);
        hash = get_hash(iph->src_addr, iph->dst_addr, tcph->src_port, tcph->dst_port);
    } else if (iph->next_proto_id == IPPROTO_UDP) {
        struct udp_hdr *udph = (struct udp_hdr *)((char *)iph + iph_len);
        hash = get_hash(iph->src_addr, iph->dst_addr, udph->src_port, udph->dst_port);
    } else {
        return queue_id;
    }

    return hash % nb_queues;
}

int main(int argc, char * argv[])
{
    ff_init(argc, argv);

    /* regist a packet dispath function */
    ff_regist_packet_dispatcher(pipeline_dispatch_cb);
    .
    .
    .
}
kwjjyn commented 5 years ago

@jfb8856606 Thank you so much . It works for me. Another question is : is this function called after dpdk receives packets and put them on the ring buffer related to different process ?

jfb8856606 commented 5 years ago

Yes, it will put the muf another ring buffer to different process while return value != queue_id.

Ye-Tian-Zero commented 5 years ago

@jfb8856606 Hi I tried your code with a little modify:

int port_dispatcher(void *data, uint16_t *len, uint16_t queue_id, uint16_t nb_queues) {
    struct ipv4_hdr *iph;
    int iph_len;

    iph = (struct ipv4_hdr *)((char*)data + ETHER_HDR_LEN);
    iph_len = (iph->version_ihl & 0x0f) << 2;

    if (iph->next_proto_id != IPPROTO_IP) {
        fprintf(stderr, "not ip \n");
        return queue_id;
    }

    iph = (struct ipv4_hdr *)((char *)iph + iph_len);
    iph_len = (iph->version_ihl & 0x0f) << 2;

    if (iph->next_proto_id == IPPROTO_TCP) {
        struct tcp_hdr *tcph = (struct tcp_hdr *)((char *)iph + iph_len);
        uint64_t port = tcph->src_port;
        fprintf(stderr, "port %lu, nb_queue %u\n", port, nb_queues);
        //hash = get_hash(iph->src_addr, iph->dst_addr, tcph->src_port, tcph->dst_port);
        return (port - 8000) % nb_queues;
    }
        fprintf(stderr, "not tcp \n");
    return queue_id;
}

However, it keeps printing 'not ip'. Can you please help me ?

jfb8856606 commented 5 years ago

In my demo, it's process ip-ip tunnel packets.

You need to modify it according to your network environment, such as comment out this code

if (iph->next_proto_id != IPPROTO_IP) {
    fprintf(stderr, "not ip \n");
    return queue_id;
}

iph = (struct ipv4_hdr *)((char *)iph + iph_len);
iph_len = (iph->version_ihl & 0x0f) << 2;
Ye-Tian-Zero commented 5 years ago

In my demo, it's process ip-ip tunnel packets.

You need to modify it according to your network environment, such as comment out this code

if (iph->next_proto_id != IPPROTO_IP) {
    fprintf(stderr, "not ip \n");
    return queue_id;
}

iph = (struct ipv4_hdr *)((char *)iph + iph_len);
iph_len = (iph->version_ihl & 0x0f) << 2;

Yes I use a standard tcp-ip protocol. The client is using ordinary linux network stack with tcp connection. I think your code can directly be used in my case. But it doesn't work.

Ye-Tian-Zero commented 5 years ago

In my demo, it's process ip-ip tunnel packets.

You need to modify it according to your network environment, such as comment out this code

if (iph->next_proto_id != IPPROTO_IP) {
    fprintf(stderr, "not ip \n");
    return queue_id;
}

iph = (struct ipv4_hdr *)((char *)iph + iph_len);
iph_len = (iph->version_ihl & 0x0f) << 2;

Oh I got what you mean. Thanks a lot :D