The code is now working again, always relying on a single GPU.
However, we should also support multiple GPUs via FSDP ('wrapped_model' in config) and CPU-only inference.
separated checkpointing into checkpoint loading and checkpoint saving. The reason is the for checkpoint loading at inference time, we don't need the checkpoint saving functionality.
introduced a inference component for the text modularity. Others modularities can be added. At the moment the model is loaded on a single GPU. Later on, we can support more sophisticated serving environments
adapted the text generation interface to work with a YAML config that specifies the tokeniser, model etc.
@mali-git @fromm-m
The code is now working again, always relying on a single GPU. However, we should also support multiple GPUs via FSDP ('wrapped_model' in config) and CPU-only inference.