gpu memory does not get released with `max_loaded_models`

Running the example code and watching watch -n .3 nvidia-smi you can see that the memory keeps increasing and is not released on the gpu.

Did i miss something here?

model = EasyNMT("opus-mt", max_loaded_models=1)

model.translate("Hallo, das ist ein Satz.", target_lang="en", source_lang="de")
model.translate("Hallo, das ist ein Satz.", target_lang="fr", source_lang="de")

time.sleep(3)
gc.collect()
torch.cuda.empty_cache()
time.sleep(3)

model.translate("Hallo, das ist ein Satz.", target_lang="nl", source_lang="de")
model.translate("Hallo, das ist ein Satz.", target_lang="it", source_lang="de")

UKPLab / EasyNMT

gpu memory does not get released with `max_loaded_models` #92