Closed percevalw closed 6 months ago
Attention: 6 lines
in your changes are missing coverage. Please review.
Comparison is base (
9d2b58f
) 96.87% compared to head (81b8e72
) 96.93%.
Files | Patch % | Lines |
---|---|---|
edsnlp/processing/multiprocessing.py | 97.71% | 4 Missing :warning: |
edsnlp/core/torch_component.py | 98.03% | 1 Missing :warning: |
edsnlp/processing/simple.py | 96.96% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Description
Added
batch_by
,split_into_batches_after
,sort_chunks
,chunk_size
,disable_implicit_parallelism
parameters to processing (simple
andmultiprocessing
) backends to improve performance and memory usage. Sorting chunks can improve yield up to twice the speed in some cases.max_tokens_per_device="auto"
parameter toeds.transformer
to estimate memory usage and automatically split the input into chunks that fit into the GPU.Changed
eds.text_cnn
pipe by running the CNN on a non-padded version of its input: expect a speedup up to 1.3x in real-world use cases.Fixed
multiprocessing
backend (e.g., no more deadlock)Checklist