Some question about implementation of translate and translate_sentences in EasyNMT

Hi, i review the code, and want to give some suggestions. As the code logic describe, if user not set source_lang in translate method, the object will auto infer the possible source_lang in translate_sentences. This behaviour will have good conclusion when the input sentences are short, when it comes to long sentence, because the perform_sentence_splitting usage, will split the totally long input sentence into small fragments and do source_lang infer on evey fragments and choose a good model to translate it (in grouped_sentences group by detected source_lang) it will suit some mix language input sentences, translate different language fragments by different model and join them back. but when the sentence_splitter unfortunate split the long input in bad manner, consider following example:

import pandas as pd
input_ = 'How many times does the  rebuilt data contain cannot handle non-empty timestamp argument! 1929 and scrapped data contain cannot handle non-empty timestamp argument! 1954?'
#### this output will be ['en', 'en', 'eo'] because the last fragment is "1954?"
#### and language_detection map it to "eo"
pd.Series(sentence_splitter(input_)).map(model.language_detection)

And when use it in opus-mt model, to translate this sentence from "eo" into "zh" this will yield a error that not have this model to load. I understand that i can avoid this error by set source_lang to "en" in translate method. But i think also need deal with this problem. I think if the language_detection and sentence_splitter can run rapidly, can try to valid all possible translate-mapping in lang_pairs in easynmt.json in the opus-mt folder in the models dir before run translate method. Or becuse the last fragments is too short to give a good suggestion on language_detection, set a evidence filter on different length fragments. Or if you can use some regex (regular expression) to fllter out some symbols (in this example '?' in "1954?") or other bad tokens before input language_detection is more useful. And i think because the different format of our input, someone may input a html document into the translate method. It is useful to provide a interface let user set a token filter (filter "?" "<\br>" ) before language_detection.

The above example is one sample i use to translate from a dataset, so when this error occured , i lost all previous translated conclusion because one Exception yield . Because the truly translate is running in batch manner, People may want to maintain some success batches by set a small batch_size and a collection to collect success batches. I hope the future version will support collect success batches and not let all lost in the final output of translate method a long (in list measure or string measure) input of documents.

UKPLab / EasyNMT

Some question about implementation of translate and translate_sentences in EasyNMT #7