Thread per core combines three big ideas: (1) concurrency should be handled in userspace instead of using expensive kernel threads, (2) I/O should be asynchronous to avoid blocking per-core threads, and (3) data is partitioned between CPU cores to eliminate synchronization cost and data movement between CPU caches. It’s hard to build high throughput systems without (1) and (2), but (3) is probably only needed on really large multicore machines.