fox-it / flow.record

Recordization library
GNU Affero General Public License v3.0
7 stars 9 forks source link

Cache the creation of RecordDescriptors in `extend_record()` #48

Closed yunzheng closed 1 year ago

yunzheng commented 1 year ago

The method extend_record() generates a new RecordDescriptor every time, which is an expensive task. The creation of the RecordDescriptors should be cached.

This will directly improve the speed where extend_record() is used, such as the --multi-timestamp option in rdump.

Zawadidone commented 1 year ago

@yunzheng wow this makes rdump --multi-timestamp very fast!

Before this issue was fixed it took roughly 17 minutes https://github.com/fox-it/flow.record/issues/46#issuecomment-1407072165, but now 3 minutes.

time find export/plugins -type f -print0 | xargs -r0I {} -P 14 sh -c 'rdump {} --multi-timestamp -w jsonfile://export/$(basename {} .jsonl).jsonl?descriptors=True'

real    2m55.554s
[...]
yunzheng commented 1 year ago

@Zawadidone wow, nice gainz! :)

Thanks for benchmarking!