Use `slice_dim` instead of `crop_dim` if possible

In large real pipelines, there are many crops for pointwise dimensions, corresponding to "batch dimensions". For example:

loop(d4, serial, [0, buffer_max(v2, 4)], 1) {
 crop_dim(v2, 4, [d4, d4]) {
  crop_dim(v1, 4, [d4, d4]) {
   loop(d3, serial, [0, buffer_max(v2, 3)], 1) {
    crop_dim(v2, 3, [d3, d3]) {
     crop_dim(v1, 3, [d3, d3]) {
      loop(d2, serial, [0, buffer_max(v2, 2)], 1) {
       crop_dim(v2, 2, [d2, d2]) {
        crop_dim(v1, 2, [d2, d2]) {
         // Other non-pointwise stuff here

This leaves the pointwise dimensions in place, so the callbacks get the full rank of the inputs, and then have to do work to skip over the batch dimensions.

If we could generate slices instead of crops, we would avoid this inefficiency. To do this, we'd need some kind of indication that the callback is going to treat some dimensions as batch dimensions, and we could only do this if all uses of the dimension are treated as batch dimensions by the callback.

In the meantime, I think we can at least optimize our buffer traversal helpers for the common case of many trailing dimensions.

dsharlet / slinky

Use `slice_dim` instead of `crop_dim` if possible #275