polcak commented 7 years ago

Hello,

I have found a behaviour that seems to be a bug.

I start the mac_learning application on the controller and create some traffic, the switch learns the MAC addresses and creates state rules, e.g.:

{table="0", key={eth_dst="00:00:00:00:11:02"}, state="1", dur_s="1", dur_ns="085494000", idle_to="0", idle_rb="0", hard_to="10000000", hard_rb="0"},

{table="0", key={eth_dst="00:00:00:00:11:24"}, state="1", dur_s="1", dur_ns="050078000", idle_to="0", idle_rb="0", hard_to="10000000", hard_rb="0"},

{table="0", key={eth_dst="00:00:00:00:05:71"}, state="1", dur_s="2", dur_ns="366394000", idle_to="0", idle_rb="0", hard_to="10000000", hard_rb="0"}

Once the hard timeout fires, the rule shifts to the state 0, e.g.:

{table="0", key={eth_dst="00:00:00:00:73:38"}, state="0", dur_s="0", dur_ns="000000000", idle_to="0", idle_rb="0", hard_to="0", hard_rb="0"},

{table="0", key={eth_dst="00:00:00:00:64:37"}, state="0", dur_s="0", dur_ns="000000000", idle_to="0", idle_rb="0", hard_to="0", hard_rb="0"},

{table="0", key={eth_dst="00:00:00:00:75:67"}, state="0", dur_s="0", dur_ns="000000000", idle_to="0", idle_rb="0", hard_to="0", hard_rb="0"}

The problem is that these rules are never freed by the switch which means that they occupy memory while the switch is running. This might not be a problem in the MAC learning scenario but it is a problem in other use cases. For example in the DDoS mitigation, we create rules for every TCP connection observed. The states are never freed.

Looking at the config.h, I see:

define BEBA_STATE_FLUSH_INTERVAL 60

Are the entries supposed to be deleted after 60 seconds?

ccascone commented 7 years ago

Yes, the state entries should be freed every BEBA_STATE_FLUSH_INTERVAL seconds. Have you found that this is not the case?

polcak commented 7 years ago

They are not freed.

ccascone commented 7 years ago

Hi @polcak, we did some test using the mac_learning.py app on the latest master of the switch and verified that every BEBA_STATE_FLUSH_INTERVAL seconds (60 in our case) the state entries are correctly freed.

One thing must be said, the following state flush condition applies: state entries are freed only when both state and timeouts are 0 (as in ofl-exp-beba.c)

if (entry->state == STATE_DEFAULT && entry->stats->hard_timeout == 0 && entry->stats->idle_timeout == 0)
{
            hmap_remove(&table->state_entries, &entry->hmap_node);
            free(entry->stats);
            free(entry);
}

If timeouts are not 0 it means that the state value could be soon different than 0.

Can you confirm us that even if you waited for more than 60 seconds, and the flush condition is met, there is still a memory leak?

polcak commented 7 years ago

That is strange. I downloaded a fresh clone of the git repository, as root I did:

$ ./boot.sh $ ./configure $ make $ make install

I run the MAC learning application, replied some traffic using tcpreplay and after a while I get entries like:

{table="0", key={eth_dst="00:00:00:00:31:95"}, state="0", dur_s="289", dur_ns="172395000", idle_to="0", idle_rb="0", hard_to="0", hard_rb="0"},

This should mean that the entry is there for 289 (this is correct), both idle and hard time outs are 0. So the entry should be freed 229 seconds ago.

ccascone commented 7 years ago

Yes, that is very strange. You are correct, that entry should have been freed 229 seconds ago.

Given that I can't see any difference between our and your environment (we both run beba-switch master with the same controller application), can you please debug this problem in your environment?

The state table flush is issued here: https://github.com/beba-eu/beba-switch/blob/master/udatapath/datapath.c#L297

polcak commented 7 years ago

So I tried with a single entry. Once it got freed after 45 seconds and once after 40 seconds.

My test created 10000 rules. I will send the PCAP to the mailing list. Please, try to debug the switch I don't have time to do it myself and we agreed in Brno and again in Paris that CNIT will work on the memory issues.

benycze commented 7 years ago

Hi, we (Pavel and Libor) were thinking about the mentioned code: https://github.com/beba-eu/beba-switch/blob/master/udatapath/datapath.c#L297

There is a comparison on equality between two times (current and flush). Shouldn't there be if(now >= dp->next_state_table_flush) ... ?

Anyway, we found out that it is not true that the rules are never freed. They are freed but very rarely and in a small amount. Based on our (imprecise) observations like 8 rules every 30 seconds. So we tweaked BEBA_STATE_FLUSH_INTERVAL to the smaller value (0 in our case). After that, the rules were being flushed more quickly.

So, the current state is very confusing for the user because the state information stays much longer than expected.

polcak commented 7 years ago

12 fixes the issue at line 297, this should be integrated to the code as it improves the memory footprint a lot.

13 removes the rules more aggressively. This helps with the rule removal but I think that you guys at CNIT will probably have a better solution for this issue.

Anyway, further tests are required. I am working on something but it might take a couple of days before I will be able to report back.

polcak commented 7 years ago

Some observations from live trials of the DDoS use case:

12 and #13 reduces the memory requirements dramatically (but not completely),
so far, valgrind testing does not look like there are memory leaks,
there are still flows in the default state which are not deleted immediately.

'dpctl unix:/var/run/sock stats-state "table=0"' takes a lot of time and reports back an error of 'Dec 07 13:33:03|00001|ofl_msg_u|WARN|Received message seemed to be valid, but it contained unused data (16).'

@ccascone, is there a way how to quickly determine the number of entries in the state table?

ccascone commented 7 years ago

Hi @polcak, @benycze,

I merged #12. Thank you. In this case, we followed the same approach of flow rules expiration (which uses the same comparison on equality) because in our understanding the parent function dp_run, should be called more times during the same second. Independently if there are packets to process. Obviously, your patch makes it less ambiguous.

Regarding #13, we need some time to investigate the issue. If I understand well the problem now is that not all the state entries that are expired are freed during the state table flush. Clearly, this is not the intended behavior, a.k.a. a bug. Setting BEBA_STATE_FLUSH_INTERVAL to 0 seems to me a workaround. Also, it is not good at all, as it means that the switch is supposed to do a linear scan of the state table everytime dp_run is called, literally pausing forwarding during that operation! We'll try to solve the bug instead.

@polcak: not that I know. The only way I know is dpctl stats-state. We'll try to fix this problem too.

polcak commented 7 years ago

@ccascone: I also see #13 as a workaround which helps but according to my observations is not completely effective. Let's wait for the results of your investigations into the issue.

As I see it now, the problem in live trials in the DDoS use case is that the ofdatapath is creating more rules than it is able to flush (even though they are expired). Which means that over time the memory consumption rise. As a result, the switch consumes so much memory that it starts to swap. Then the processing gets really slow until the process is killed because the machine has no memory left.

DavideSanvito commented 7 years ago

@polcak: We provided a fix in the latest master. I tried your pcap file with 10000 state entries and also with 100K state entries (with the script you provided in D6.3) and now expired entries are periodically freed as expected (dpctl stats-state now shows just the default state entry and no expired entry with a duration > BEBA_STATE_FLUSH_INTERVAL are present).

polcak commented 7 years ago

Thank you, it looks very good now.

beba-eu / beba-switch

Switch does not remove state entries in the state 0 #11

define BEBA_STATE_FLUSH_INTERVAL 60

12 fixes the issue at line 297, this should be integrated to the code as it improves the memory footprint a lot.

13 removes the rules more aggressively. This helps with the rule removal but I think that you guys at CNIT will probably have a better solution for this issue.

12 and #13 reduces the memory requirements dramatically (but not completely),