Split touch(i,j) into read(i,j) and write(i,j)

This is probably one of the last things we'd want to implement, but in terms of shared caches, reading is fundamentally cheaper than writing, since it doesn't require invalidating caches in other cores/processors.

This might only be useful if we can show a difference between the cost of a read miss (that is, a cache line pull) and a write miss (that requires invalidating other caches).

brk / canvas-cacheviz

Split touch(i,j) into read(i,j) and write(i,j) #1