merge dev for release 0.9.6

Add Python 3.11.
Add pre-processor when parsing addresses.
Add pin_memory=True when using a CUDA device to increase performance as suggested by Torch documentation.
Add torch.no_grad() context manager in __call__() to increase performance.
Reduce memory swap between CPU and GPU by instantiating Tensor directly on the GPU device.
Improve some Warnings clarity (i.e. category and message).
Bug-fix MacOS multiprocessing. It was impossible to use in multiprocess since we were not testing whether torch multiprocess was set properly. Now, we set it properly and raise a warning instead of an error.
Drop Python 3.7 support since newer Python versions are faster and Torch 2.0 does not support Python 3.7.
Improve error handling with wrong checkpoint loading in AddressParser retrain_path use.
Add torch.compile integration to improve performance (Torch 1.x still supported) with mode="reduce-overhead" as suggested in the documentation. It increases the performance by about 1/100.

GRAAL-Research / deepparse