Closed jsypkens closed 5 years ago
Thanks for the tip. I'll have a look.
@jsypkens, did you have a fix for this already? I'll try to look at it next weekend, but if you have a fix, that would save me.
I made a few other changes to the method as well, and the implementation changed a little bit, so I'm not sure if my fix would be what you're looking for.
I separated out the "maxSize" into two parameters - maxListSize and maxElementSize, then looped through each element in values, checking if it would exceed the group or the element size. I also changed the return type to List<List<<Tuple<int,string>>, for two main reasons:
`public static List<List<Tuple<int, string>>> SplitList2(IEnumerable
List<Tuple<int, string>> newList = new List<Tuple<int, string>>();
foreach (var value in values)
{
if (newList.Count > groupSize)
{
result.Add(newList);
newList = new List<Tuple<int, string>>();
currentListSize = 0;
}
if (value.Length > maxElementSize) //current item is longer than item max, split it on sentence/paragraph
{
if (value.Length > maxListSize)
{
//Split this item up! by itself, it's too long for the list. Use BreakSentence API to split on sentences.
BreakSentenceResponse[] BRR = TranslationServiceFacade.BreakSentences(new string[] { value });
List<string> sentences = new List<string>();
int currentPosition = 0;
foreach (var sentSplitLocation in BRR[0].sentLen) //BRR[0]: we're only passing in one value, we should only get one out
{
string currentSentence = value.Substring(currentPosition, sentSplitLocation);
if (sentences.Count > 0)
{
//Check to see if we will exceed the maxElementSize if we added this sentence
if (sentences.Last().Length + currentSentence.Length + 1 > maxElementSize)
sentences.Add(currentSentence); //would exceed the max, so just add it to the list separately
else
sentences[sentences.Count() - 1] += " " + currentSentence; //wouldn't exceed the max, append to the current list
}
else //first sentence, just add it
sentences.Add(currentSentence);
currentPosition += sentSplitLocation;
}
//loop through sentences, and add them to the list to be translated.
foreach (var sent in sentences) //make sure extractedSentence is under maxElementSize
{
if (currentListSize + sent.Length >= maxListSize || newList.Count >= groupSize)
{
result.Add(newList);
newList = new List<Tuple<int, string>>();
currentListSize = 0;
}
newList.Add(new Tuple<int, string>(currentLineNumber, sent));
}
}
else //the element isn't too big for the list by itself...
{
if (currentListSize + value.Length >= maxListSize) //next item will put the list over the max, start a new list
{
result.Add(newList);
newList = new List<Tuple<int, string>>();
currentListSize = 0;
}
BreakSentenceResponse[] BRR = TranslationServiceFacade.BreakSentences(new string[] { value });
List<string> sentences = new List<string>();
int currentPosition = 0;
foreach (var sentSplitLocation in BRR[0].sentLen) //we're only passing in one value, we should only get one out
{
string currentSentence = value.Substring(currentPosition, sentSplitLocation);
if (sentences.Count > 0)
{
if (sentences.Last().Length + currentSentence.Length + 1 > maxElementSize)
sentences.Add(currentSentence);
else
sentences[sentences.Count() - 1] += " " + currentSentence;
}
else
sentences.Add(currentSentence);
currentPosition += sentSplitLocation;
}
foreach (var sent in sentences) //make sure extractedSentence is under maxElementSize
{
if (currentListSize + sent.Length >= maxListSize || newList.Count >= groupSize)
{
result.Add(newList);
newList = new List<Tuple<int, string>>();
currentListSize = 0;
}
newList.Add(new Tuple<int, string>(currentLineNumber, sent));
currentListSize += sent.Length;
}
}
}
else // element does not exceed the max element size
{
if (currentListSize + value.Length >= maxListSize) //make sure it won't exceed the current list size
{
result.Add(newList);
newList = new List<Tuple<int, string>>();
currentListSize = 0;
}
newList.Add(new Tuple<int, string>(currentLineNumber, value));
currentListSize += value.Length;
}
currentLineNumber++;
}
if (newList.Count > 0)
result.Add(newList);
foreach (var list in result)
{
int totalLength = 0;
foreach (var item in list)
{
totalLength += item.Item2.Length;
}
if (totalLength > maxListSize)
{
logger.Error("list exceeds max. why?");
}
}
return result;
}
`
@jsypkens, thanks a lot for posting your code. I'll have a look over the weekend. Good use of BreakSentences. That'll be required.
Thanks, and finally fixed in the 2.3.0 release. I ended up not using your code. I left the SplitList functin mostly intact, added an exit criterion, and allowed segments that are larger than 5000 characters to be translated relardless, by inserting a splitter function that uses the server's BreakSentence method. I would love if you can give that a try and see whether it works for you.
In DocumentTranslationManager, the method "SplitList" has an infinite loop possibility. When an individual element exceeds the maxSize, it will do the following:
This is more apparent with Text Translator v3 API, as the element and list sizes are smaller. One contiguous paragraph doesn't need to be very long to cause this behavior.