Open hazzik opened 10 years ago
Not a bad idea. As you said this could get a bit tricky on long words; e.g. enthusiastically is 16 chars long!
Here is some crazy long names for the strategies: Truncator.NoMoreWordsAfterNumberOfCharacters
or Truncator.NoMoreWordsBeforeNumberOfCharacters
?
Im close to finishing this but just want to clarify.
"Hello Beautiful World".Truncate(8, "...", NoMoreWordsAfterNumberOfCharacters) ;
Will truncate to --> "Hello Beautiful..."
"Hello Beautiful World".Truncate(8, "...", NoMoreWordsBeforeNumberOfCharacters) ;
Will truncate to --> "Hello..."
Correct.
Still unsure about the names! They're a bit too long and awkward.
How about MaximumLengthWithoutWordBreak
(for truncating at the word before) and MaximumLengthCompleteCurrentWord
(for truncating after the word that the maximum lands in)? Still long but a little less awkward.
Thanks @justin-edwards. I like this better although the word maximum in the name is a bit confusing at the first sight.
I've just picked this one up. I'm thinking about changing the names, though. What do you think about DiscardIncompleteWord
and FinishIncompleteWord
?
Also, it's not clear to me if this should work with a fixed number of characters or a fixed length. I'm thinking both would make sense. That being said, I'd like to propose that truncating strategy can be combined with incomplete word handling, as such:
"Text with more characters than truncate length"
.Truncate(12, Truncator.FixedLength.DiscardingIncompleteWord());
=> "Text with…"
"Text with more characters than truncate length"
.Truncate(12, Truncator.FixedLength.FinishingIncompleteWord());
=> "Text with more…"
DiscardingIncompleteWord
and FinishingIncompleteWord
would be extension methods that could be applied to both Truncator.FixedLength
and Truncator.FixedNumberOfCharacters
(but only to these two). The idea is to return a truncator Wrapper (which is, of course, ITruncator
), that would finish or discard the word (to either left or right) based on the inner truncator result.
How does that sound like?
@alextercete Cool you're going to push this forward.
I like the names DiscardIncompleteWord
and FinishIncompleteWord
.
DiscardingIncompleteWord
andFinishingIncompleteWord
would be extension methods that could be applied to bothTruncator.FixedLength
andTruncator.FixedNumberOfCharacters
(but only to these two)
That sounds weird, how would the library user know about this limitation?
I'm currently unsure about the concrete use case for this functionality and can't conclude whether incomplete word handling is an option for a strategy or a strategy itself. @hazzik Can you elaborate the use case you had in mind?
That sounds weird, how would the library user know about this limitation?
I was thinking about adding an interface, that is a specialization of ITruncator
(something like ICharacterWiseTruncator
), which both FixedNumberOfCharactersTruncator
and FixedLengthTruncator
would implement. Then the extension methods would only apply to the specialization:
public static ITruncator DiscardingIncompleteWord(this ICharacterWiseTruncator innerTruncator)
{
// (...)
}
public static ITruncator FinishingIncompleteWord(this ICharacterWiseTruncator innerTruncator)
{
// (...)
}
By changing the return type of Truncator.FixedLength
and Truncator.FixedNumberOfCharacters
to ICharacterWiseTruncator
, the library user would only be able to use the extension methods for these truncators.
Thanks for picking this up @alextercete. I really like the proposed names.
I don't think these should be extension of existing strategies. There are currently three strategies: FixedLength
, FixedNumberOfCharacters
and FixedNumberOfWords
. This functionality obviously doesn't apply to FixedNumberOfWords
at all and the other two are more or less the same except that one includes the truncation character and one doesn't, and that doesn't matter when you want to finish or leave out the last word. So I think these should be two new strategies.
One thing I might also add here is that the implementation should be able to truncate from left and right (depending on the provided TruncateFrom
parameter). Godspeed :)
Ok, now I see the source of my confusion. I thought the value for FixedLength
was the expected size of the the final string, while the value for FixedNumberOfCharacters
was the expected number of non-whitespace characters in the final string.
Thanks for the reminder on left and right truncation. I'll keep that in mind.
I'll put together a Pull Request in the next few days to address this.
Sorry, I'm little bit late, and haven't read all the discussion yet. But here some of my thoughts:
As long as you add DiscardingIncompleteWord()
or FinishingIncompleteWord()
to Truncator.FixedLength
the strategy become not a Fixed Length, but variable length, so I'm strongly disagree with this proposal.
We ended up deciding against the extension methods. But good point, anyway.
Hi @alextercete! Are you still working on this? Anything I can do to help push this forward?
I started working on this, but eventually stopped. Looking into it again I remember why I couldn't decide on what approach to take. Here's why:
There are, currently, three truncation strategies available:
Fixed length: The resulting string is going to have exactly the number of characters that was provided as a parameter:
"abc def ghi".Truncate(8, "...", Truncator.FixedLength); // "abc d..."
Fixed number of characters: The resulting string is going to have exactly the number of non-whitespace characters that was provided as a parameter:
"ab cd ef gh ij".Truncate(8, "...", Truncator.FixedNumberOfCharacters); // "ab cd e..."
Fixed number of words: The resulting string is going to have exactly the number of words that was provided as a parameter:
"ab cd ef gh ij".Truncate(3, "...", Truncator.FixedNumberOfWords); // "ab cd ef..."
Word completion would only make sense for 1 and 2. As far as I understand it, this is the expected behavior:
"abc def ghi".Truncate(8, "...", Truncator.FixedLength); // "abc d..."
"abc def ghi".Truncate(8, "...", Truncator.FixedLengthDiscardingIncompleteWord); // "abc..."
"abc def ghi".Truncate(8, "...", Truncator.FixedLengthFinishingIncompleteWord); // "abc def..."
"ab cd ef gh ij".Truncate(8, "...", Truncator.FixedNumberOfCharacters); // "ab cd e..."
"ab cd ef gh ij".Truncate(8, "...", Truncator.FixedNumberOfCharactersDiscardingIncompleteWord); // "ab cd..."
"ab cd ef gh ij".Truncate(8, "...", Truncator.FixedNumberOfCharactersFinishingIncompleteWord); // "ab cd ef..."
Not that I like the giant method names, but I think these are more of an addition to the existing truncation strategies (those that deal with characters as opposed to words) than they are new strategies themselves.
I actually think that all of the truncation API could be modified to allow combining the truncation options. For example:
"abc def ghi".Truncate.With("...").ToLength(8); // "abc d..."
"abc def ghi".Truncate.RightToLeft.With("...").ToLength(8); // "...f ghi"
"abc def ghi".Truncate.With("...").DiscardingIncompleteWord.ToLength(8); // "abc..."
"abc def ghi".Truncate.With("...").ToNumberOfCharacters(8); // "ab cd e..."
"ab cd ef gh ij".Truncate.With("...").ToNumberOfWords(3); // "ab cd ef..."
I believe this change would lead to more reusable and easy-to-test code.
Do you think it would be simpler to do something like this?
"abc def ghi".TruncateByWords(5 /* desiredChars */, Truncator.CompleteWord); // "abc def"
"abc def ghi".TruncateByWords(5 /* desiredChars */, Truncator.CompleteWord).With("..."); // "abc def..."
"abc def ghi".TruncateByWords(5 /* desiredChars */, Truncator.DiscardIncompleteWord); // "abc"
"abc def ghi".TruncateByWords(5 /* desiredChars */, Truncator.DiscardIncompleteWord).With("..."); // "abc..."
// vs.
"abc def ghi".Truncate(5 /* maxChars */); // "abc d"
"abc def ghi".Truncate(5 /* maxChars */).With("..."); // "abc d..."
"abc def ghi".Truncate(8 /* maxChars */); // "abc def" (the whitespace is trimmed)
"abc".Truncate(5 /* maxChars */); // "abc"
"abc".Truncate(5 /* maxChars */).With("..."); // "abc"
TruncateByWords
understands the notion of words. Note that Truncate
takes in a maxChars
, implying the output could be less than that (happens when trimming white spaces). Otherwise it'd just be syntactic surgar for SubString
.
Note this interesting use case. This is how one can ensure the display
string won't overflow:
string sentence= "Pneumonoultramicroscopicsilicovolcanoconiosis is a very long word";
string display = sentence.TruncateByWords(10 /* desiredChars*/, Truncator.DiscardIncompleteWord);
// display == string.Empty;
if (string.IsNullOrWhiteSpace(display))
{
display = sentence.Truncate(maxChars: 10).With("..."); // "Pneumonoul..."
}
Thanks for reviving this thread. There's some prior (abandoned) work on this at https://github.com/MehdiK/Humanizer/pull/210. That might be an easier starting point!
I just made one myself before visiting this thread.
Maybe it's useful to someone:
public class WordsInLengthTruncator : ITruncator
{
public string Truncate(string value, int length, string truncationString, TruncateFrom truncateFrom = TruncateFrom.Right)
{
if (value == null)
return null;
if (value.Length == 0)
return value;
if (value.Length < length) return value;
if (truncationString.Length > length) throw new ArgumentOutOfRangeException("truncationString cannot be longer than the max length");
var words = value.Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
int currentLength = 0;
words = truncateFrom == TruncateFrom.Right ? words : words.Reverse().ToArray();
foreach (var word in words)
{
if (currentLength + word.Length + truncationString.Length > length) break;
currentLength += (word.Length +1);// +1 to add space character for length computation.
}
if (currentLength == 0) currentLength = length; // this is a special case where first word is longer than length, it will break it on characters.
else currentLength = truncateFrom == TruncateFrom.Right
? currentLength - 1 // -1 removes the extra space character on Right direction.
: currentLength + 1; // +1 removes the extra space character on Left direction.
return truncateFrom == TruncateFrom.Right
? value.Substring(0, currentLength) + truncationString
: truncationString + value.Substring(value.Length - currentLength);
}
}
@Bartmax thanks, but your currentLength
calculation algorithm is very unintuitive and non performant.
value.Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
- searches whole string for whitespace characters and then splits a string. The more optimal and easy to understand solutions is following:
//This only for `TruncateFrom.Left`
int currentLength = length;
for (; index < value.Length; index++)
{
if (char.IsWhiteSpace(value[index]))
{
break;
}
}
@hazzik well.... to be fair, I just bought that from Humanizr own truncator for consistency.
I see how then you do a foreach. I find myself the foreach more unintuitive but that may be just me. can't say anything about performant, specially without data. Based on the code I would assume you just throw non performant just because.
Any update on this?
@hazzik @mexx Is there a reason why you didn't take @Bartmax 's solution?
Do you need a better one from the community?
I'm asking because there's still a jump-in
tag on the issue, but it hangs dry for almost 3 years now.
@gdoron I think we mostly need someone to add tests and a clean PR, if you are motivated ;)
We can improve performance on @Bartmax 's solution and Humanizr own truncator in another PR.
@aloisdg I'm afraid my hands are full at the moment, we are launching a new product in the next week... and I have 0 knowledge about humanizer except the very basic usage, so I'm afraid I can not accept the task. Sorry. 😢
BTW FYI, @Bartmax, I saw Amazon.com except from truncate on whole words, they also truncate in the middle of numbers, which I think makes a lot of sense in most cases.
The idea is to leave last word unbreaked.
=>
"Text with more..."
or =>"Text with..."
I'm not sure about the name of strategy. Also I'm not sure what to do with long words.