arogozhnikov / einops

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
https://einops.rocks
MIT License
8.55k stars 352 forks source link

How to ignore π™½πšŠπ™½ in reduce? #166

Open randolf-scholz opened 2 years ago

randolf-scholz commented 2 years ago

Numpy and many other libraries have introduced additional aggregation functions that ignore π™½πšŠπ™½-values, for instance:

  1. Use-cases This would be mostly a comfort increase. Avoiding aggregation over π™½πšŠπ™½-values when working with data that has missing values, or when padding (padding with π™½πšŠπ™½'s instead of 0's has the advantage that any computation that accidentally uses the padding values will result in a π™½πšŠπ™½ again - thus making it easier to notice such bugs.)
  2. Implementation. Either, avoid iterating over π™½πšŠπ™½-values altogether, or chose a masking value appropriate for the chosen reduction, e.g.
    • nansum β†’ replace π™½πšŠπ™½ with 0
    • nanprod β†’ replace π™½πšŠπ™½ with 1
    • nanmax β†’ replace π™½πšŠπ™½ with -π™Έπš—πš
  3. Integrity - does it interplay well with existing operations and notation in einops? It is a simple additional boolean flag ignore_nan for reduce
  4. Readability. Alternatively, one could have a nanreduce that does the same thing but is visually more striking.

Similarly, one could consider an additional ignore_infinite-flag.

arogozhnikov commented 2 years ago

It is supported by providing callables for reductions in einops.reduce. Some examples:

# numpy
einops.reduce(array, 'i j k -> (i j)', np.nanmean)
# torch 
einops.reduce(array, 'i j k -> (i j)', torch.nanmean)