Clever / optimus

Concurrently extract, transform, and load tables of data in Go
Apache License 2.0
34 stars 7 forks source link

create a more memory efficient sort #60

Closed natebrennand closed 7 years ago

natebrennand commented 7 years ago

ticket: https://clever.atlassian.net/browse/DIS2-1038

Adds a StableCompressedSort for more memory-efficient sorting. This is done by finding the "sort key" for each row as they're ingested by the sorter. While all rows are gathered in memory, they're converted to JSON blobs since []byte takes up far less space than their map[string]interface{} form. The sorting is then done based on the "sort key" and the blobs are conveted back to Optimus rows on their way out.

azylman commented 7 years ago

@natebrennand Is there a reason for this to be a new method, or should it just be a new, more efficient implementation of the old method?

natebrennand commented 7 years ago

It is a bit slower if you aren't paging on memory, so I wasn't entirely sure which to do.

Also, w/o taking a sort function that takes in interface{}, it wasn't obvious how to do the sorting. This implementation forces your sort key to be a string type.

natebrennand commented 7 years ago

I guess we could add a trimmed down optimus.Row w/ only 1 key:value pair to call the sort function on, but I'd rather keep this more efficient w/o it.

azylman commented 7 years ago

@natebrennand That makes sense (keeping it separate so callers can choose if they want memory-optimized or speed-optimized), thanks for the explanation