If the task is cpu bound, having the same number of tokens as cpu cores is probably the best way to go (to avoid unnecessary context switching), if the task is IO bound, using more tokens is useful. I don't know the sweet spot for optimal performance, but what I usually do is 50-100 tokens for 1-2 cores, 100-200 for 4-8 cores, 200-300 for 8+. Would be useful to actually test this.
Adding a tip on setting the threadpool number of tokens might be useful
If the task is cpu bound, having the same number of tokens as cpu cores is probably the best way to go (to avoid unnecessary context switching), if the task is IO bound, using more tokens is useful. I don't know the sweet spot for optimal performance, but what I usually do is 50-100 tokens for 1-2 cores, 100-200 for 4-8 cores, 200-300 for 8+. Would be useful to actually test this.
(I believe 40 is the standard number, but you know this better than I do)