[dattri.func] Fix `ChunkedCudaProjector` bug

Description

1. Motivation and Context

We found that ChunkedCudaProjector not working as expected for large models. For Dict input, we found it hard to project without dividing gradients, so we will inherently force dict input to be tensor and split the tensor for projection if it is too large.

2. Summary of the change

fix chunkcudaprojector bug by forcing tensor input
add chunkcudaprojector test case for tensor input

Note that I preserve most of the original code and possibly we can come back and make it support dict input (without enforcing it to be tensor).

3. What tests have been added/updated for the change?

[x] Unit test: Typically, this should be included if you implemented a new function/fixed a bug.

TRAIS-Lab / dattri

[dattri.func] Fix `ChunkedCudaProjector` bug #113

Description

1. Motivation and Context

2. Summary of the change

3. What tests have been added/updated for the change?