connor-qingxia / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
0 stars 1 forks source link

Debugging Failed Test Case for SlidingWindowPipeline after Rebase #2

Open wigwit opened 1 year ago

wigwit commented 1 year ago

After our branch get rebased, the test case failed because the updated TokenClassificationPipeline has added stride as an optional arguments that create conflicts with how we are defining stride.

Currently we are defining the stride here, where the following relation holds if stride is None. And the pipeline will process this task using a sliding window from the defined stride and window_lengthhttps://github.com/connor-qingxia/transformers/blob/a8fc5aa6ab7b941d961d512ae3e9bdf3b5f99e8c/src/transformers/pipelines/token_classification.py#L578 This creates a conflict with how stride argument is processed here. If stride is None, the pipeline would simply process this task as a usual token classification task.

wigwit commented 1 year ago

And the inconsistency is giving the following test case error messages

FAILED tests/pipelines/test_pipelines_token_classification.py::TokenClassificationPipelineTests::test_sliding_window - TypeError: 'BatchEncoding' object is not an iterator

TokenClassificatioPipeline has been modified so that the preprocess function could handle stride and if stride is passed in, the tokens will be aggregated again in aggregation_overlapping_entities function here.

I have tried a few things to see if this problem can be overcome. Here is what I have done:

Here is the result test error in the `postprocess` function, it seems like `TokenClassificationPipeline` also change the returned model output when `stride` is passed in. Further investigation is required on this.
```python
FAILED tests/pipelines/test_pipelines_token_classification.py::TokenClassificationPipelineTests::test_sliding_window - TypeError: list indices must be integers or slices, not str