Multi head ml - Githubissues

aphp / edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

BSD 3-Clause "New" or "Revised" License

112 stars 29 forks source link

Description

Added

Added batch_by, split_into_batches_after, sort_chunks, chunk_size, disable_implicit_parallelism parameters to processing (simple and multiprocessing) backends to improve performance and memory usage. Sorting chunks can improve yield up to twice the speed in some cases.
The deep learning cache mechanism now supports multitask models with weight sharing in multiprocessing mode.
Added max_tokens_per_device="auto" parameter to eds.transformer to estimate memory usage and automatically split the input into chunks that fit into the GPU.

Changed

Improved speed and memory usage of the eds.text_cnn pipe by running the CNN on a non-padded version of its input: expect a speedup up to 1.3x in real-world use cases.

Fixed

Improved error handling in multiprocessing backend (e.g., no more deadlock)

Checklist

[x] If this PR is a bug fix, the bug is documented in the test suite.
[x] Changes were documented in the changelog (pending section).
[x] If necessary, changes were made to the documentation (eg new pipeline).

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (9d2b58f) 96.87% compared to head (81b8e72) 96.93%.

Files	Patch %	Lines
edsnlp/processing/multiprocessing.py	97.71%	4 Missing :warning:
edsnlp/core/torch_component.py	98.03%	1 Missing :warning:
edsnlp/processing/simple.py	96.96%	1 Missing :warning:

Files

Patch %

Lines

edsnlp/processing/multiprocessing.py

97.71%

4 Missing :warning:

edsnlp/core/torch_component.py

98.03%

1 Missing :warning:

edsnlp/processing/simple.py

96.96%