firnsy / barnyard2

Barnyard2 is a dedicated spooler for Snort's unified2 binary output format.
GNU General Public License v2.0
344 stars 190 forks source link

This patch speeds up startup and adds the --unique-map option #181

Open notnyt opened 8 years ago

notnyt commented 8 years ago

Barnyard2 takes a long time to start due to poorly optimized list code. For every SID added to the list, the entire list is iterated to get to the tail of the list. Adding and maintaining a tail pointer prevents this, since we can just add entries directly at the end.

The --unique-map option is also added, this prevents each SID from the files being checked against what's in memory. Most tools generate unique files, making this check unnecessary and wasteful.

binf commented 8 years ago

The code should not assume certain tools usage if input is corrupted output will also be.

On Mon, Feb 8, 2016 at 9:01 AM, Rob Mosher notifications@github.com wrote:

Barnyard2 takes a long time to start due to poorly optimized list code. For every SID added to the list, the entire list is iterated to get to the tail of the list. Adding and maintaining a tail pointer prevents this, since we can just add entries directly at the end.

The --unique-map option is also added, this prevents each SID from the files being checked against what's in memory. Most tools generate unique

files, making this check unnecessary and wasteful.

You can view, comment on, or merge this pull request online at:

https://github.com/firnsy/barnyard2/pull/181 Commit Summary

  • This patch speeds up startup and adds the --unique-map option

File Changes

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/firnsy/barnyard2/pull/181.

notnyt commented 8 years ago

Well, it's added as an option if you have known good input. The linked list optimization is unrelated, you can merge whichever parts you want.

binf commented 8 years ago

I agree that using tail for that can speed up startup but its not the in memory linked list parsed from sid-msg.map or gen-msg.map file that are slow. Its the sanity check of what is in the database.

On Tue, Mar 22, 2016 at 8:57 PM, Rob Mosher notifications@github.com wrote:

Well, it's added as an option if you have known good input. The linked list optimization is unrelated, you can merge whichever parts you want.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/firnsy/barnyard2/pull/181#issuecomment-200098587

notnyt commented 8 years ago

It still shaved a few minutes of time off during startup using a long list.

binf commented 8 years ago

define long_list

On Tue, Mar 22, 2016 at 9:13 PM, Rob Mosher notifications@github.com wrote:

It still shaved a few minutes of time off during startup using a long list.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/firnsy/barnyard2/pull/181#issuecomment-200103771

notnyt commented 8 years ago

10-20k

binf commented 8 years ago

And on what type of hardware?

Minutes you see are from here: if( BcUniqueMap() || (cacheSignatureLookup(&lookupNode,iMasterCache->cacheSignatureHead) == 0) ) Comment that line in your patch and you shouldn't see minutes anywhere.

On Tue, Mar 22, 2016 at 9:19 PM, Rob Mosher notifications@github.com wrote:

10-20k

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/firnsy/barnyard2/pull/181#issuecomment-200108363

notnyt commented 8 years ago

hardware is AMD GX-412TC SOC, quad core 1ghz

With the --unique-map option, BcUniqueMap will return 1, and prevent cacheSignatureLookup from running on the line you referenced.

It still takes quite some time to load the rules, even with that in place.

binf commented 8 years ago

Sorry i replied quickly with something that should be read as follow:

Initialization of 10-20k if it takes minutes to read the sid-msg.map is probably more hardware setup dependant.

Where is agree is tail adding should be good enough, on the other hand using head rather than using tail would have the same effect. Where i do not agree is on integrity bypass. Will queue that in a form or an other in the pipe.

On Tue, Mar 22, 2016 at 9:22 PM, beenph beenph@gmail.com wrote:

And on what type of hardware?

Minutes you see are from here: if( BcUniqueMap() || (cacheSignatureLookup(&lookupNode,iMasterCache->cacheSignatureHead) == 0) ) Comment that line in your patch and you shouldn't see minutes anywhere.

On Tue, Mar 22, 2016 at 9:19 PM, Rob Mosher notifications@github.com wrote:

10-20k

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/firnsy/barnyard2/pull/181#issuecomment-200108363

binf commented 8 years ago

If you use a database, do a select count(*) from signature; im curious.

On Tue, Mar 22, 2016 at 9:35 PM, beenph beenph@gmail.com wrote:

Sorry i replied quickly with something that should be read as follow:

Initialization of 10-20k if it takes minutes to read the sid-msg.map is probably more hardware setup dependant.

Where is agree is tail adding should be good enough, on the other hand using head rather than using tail would have the same effect. Where i do not agree is on integrity bypass. Will queue that in a form or an other in the pipe.

On Tue, Mar 22, 2016 at 9:22 PM, beenph beenph@gmail.com wrote:

And on what type of hardware?

Minutes you see are from here: if( BcUniqueMap() || (cacheSignatureLookup(&lookupNode,iMasterCache->cacheSignatureHead) == 0) ) Comment that line in your patch and you shouldn't see minutes anywhere.

On Tue, Mar 22, 2016 at 9:19 PM, Rob Mosher notifications@github.com wrote:

10-20k

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/firnsy/barnyard2/pull/181#issuecomment-200108363

notnyt commented 8 years ago

50k in there at the moment. it's not really bypassing integrity, worst case is the same rule can be added multiple times. If other tools are known to produce unique output, it helps somewhat.