Finish GPU elementwise broadcasting op implementation.

feiwang3311 / Lantern

BSD 3-Clause "New" or "Revised" License

167 stars 15 forks source link

Finish GPU elementwise broadcasting op implementation. #30

Closed dan-zheng closed 5 years ago

dan-zheng commented 5 years ago

Elementwise tensor-tensor and tensor-scalar ops all work. TODO: implement broadcasting between tensors of different ranks.

dan-zheng commented 5 years ago

Things may get tricky if it's necessary to generate non-__host__ __device__ lambdas.

I wonder if it's somehow possible to annotate instances Lambda as being "GPU" lambdas.

One unideal solution is to create "GPU" versions of Lambda and fun. This is would be an unacceptable amount of code dupe, however.

dan-zheng commented 5 years ago

I discussed with @feiwang3311 and we agreed that changing Lambda codegen is too dangerous. I'll reimplement things (in a more ad-hoc way) to avoid changing Lambda codegen.

dan-zheng commented 5 years ago

Reimplemented ops without changing Lambda codegen in https://github.com/feiwang3311/Lantern/pull/30/commits/71876be7cb5206177b522dbb6e03a80d761878a7.

There's hacky logic for propagating Rep[Float] arguments to an unchecked call: namely, the op argument to launchUnaryKernel has type String => Seq[Any].

Otherwise, elementwise ops work as intended. Ready for review.