We found that ChunkedCudaProjector not working as expected for large models. For Dict input, we found it hard to project without dividing gradients, so we will inherently force dict input to be tensor and split the tensor for projection if it is too large.
2. Summary of the change
fix chunkcudaprojector bug by forcing tensor input
add chunkcudaprojector test case for tensor input
Note that I preserve most of the original code and possibly we can come back and make it support dict input (without enforcing it to be tensor).
3. What tests have been added/updated for the change?
[x] Unit test: Typically, this should be included if you implemented a new function/fixed a bug.
Description
1. Motivation and Context
We found that
ChunkedCudaProjector
not working as expected for large models. For Dict input, we found it hard to project without dividing gradients, so we will inherently force dict input to be tensor and split the tensor for projection if it is too large.2. Summary of the change
Note that I preserve most of the original code and possibly we can come back and make it support dict input (without enforcing it to be tensor).
3. What tests have been added/updated for the change?