Closed kdcyberdude closed 8 months ago
Hi IndicTrans2 was trained for sentence level translation so passing documents won't work. Best you can do is break documents into sentences based on punctuation, translate segments, assemble.
Hi @kdcyberdude
IndicTrans2 currently supports sentence-level translation as mentioned by my colleague @prajdabre. You can model.translate_paragraph
usage in the inference section. We will add the paragraph translation support in the huggingface example in the coming week. Thank you!
I'm looking to translate large documents into English, but I'm encountering an issue with the maximum sequence length of 256 while translating. In some instances, even after splitting the document, some sentences are still longer than 256 tokens. This situation might potentially impact the global context. Could you provide me with any suggestions or recommendations to handle this effectively?
@jaygala24