Closed bandantonio closed 3 months ago
Replying to the comment from @absorpheus
Are these trailing spaces, newlines and tabs appearing only at the end of the highlighted text block (contextualText) or do they appear in other places too? e.g. in between highlighted paragraphs
They can be both inside the contextualText and at the end of it. My initial thought was to concentrate on the "end of a block" case, as it should be easier to catch and more relevant in terms of our initial goal. I want to play around with regular expressions, specifically, word boundaries and see whether it helps.
I pushed a commit to this branch containing tests which demonstrate handling the edge cases stated in the problem overview above using both preserveNewLineIndentation
and removeAllLastNewlines
helper functions.
Let me know if this meets the requirements in terms of functionality and if there's anything that I've missed. My thinking is to first prove we can handle these edge cases and only then begin to refactor to a more elegant solution. I look forward to seeing what you come up with in terms of regex and word boundaries.
I pushed a commit to this branch
I created a higher-order function (HOF) createTextBlockHandlerChain
, that takes in any number of TextBlockHandler
's as parameters and returns a function that takes a string (text block) as a parameter and applies all the handler functions to it in sequential order, passing the output of the previous handler invocation as input to the next handler:
TextBlockHandler
is a function type that takes in a string and returns a string:
We can use createTextBlockHandlerChain
to chain multiple text block handlers to create a new function handleTextForContext
like this:
And then use handleTextForContext
as follows:
I also created a helper function createTextBlockHandler
which takes a regular expression and the replacement text as parameters. This way we don't have to repeat the function logic each time we create a handler:
We can then use createTextBlockHandler
as follows:
I think this way we can test each handler function separately and test handleTextForContext
too (and other functions which chains multiple handlers together).
@absorpheus Thanks for the POC, it looks great and I do like this approach. I also spent some time with chatGPT elaborating more on this idea, and high-order functions is one reasonable way to go.
But I also had a second thought about the solution in general. Higher-order functions add another layer of complexity (and even may be considered a kind of an advanced technique), however, considering the overall simplicity of the current codebase, I would rather avoid adding this complexity for as long as possible, because I want to keep the codebase human-friendly and easy to understand.
Also, considering that currently we only have two cases with 1) preserving indentation and 2) removing trailing newlines and spaces, I tend to think that it would be over-engineering for these pretty simple use cases, at least for now. Plus, we don't know (yet) is there something else in text blocks that should be fixed, so potentially, we can end up with those two cases only.
So my suggestion is to proceed with separate helper functions for as long as we can, chaining a few of them (up to 3) wouldn't be a problem for now. And let's wait for the users' feedback to understand whether we need to iterate further on this. And your PR along with this issue will be placed on hold until the next round of discussion.
P.S: Please, don't think that I'm rejecting your solution, or discarding your efforts. Absolutely not. I was the initiator of this discussion to validate the idea and understand possible solutions. And I'm just trying not to jump into the boat of unjustified code complexity too early without proper confirmation of whether it makes sense now.
Let me know if this meets the requirements in terms of functionality and if there's anything that I've missed. My thinking is to first prove we can handle these edge cases and only then begin to refactor to a more elegant solution. I look forward to seeing what you come up with in terms of regex and word boundaries.
@absorpheus No objections from my side on the chosen approach.
I played around with word boundaries, but eventually end up with just /\s+$/
as \s
matches exactly what we need - any space, tab or newline character. Boundaries, on the other hand, have more edge cases to keep in mind. So the helper function would look like this:
const removeTrailingSpaces = (textBlock: string): string => {
const endLineSpaces = /\s+$/;
return endLineSpaces.test(textBlock) ? textBlock.replace(endLineSpaces, '') : textBlock;
}
Following up your question and my initial response about newlines inside a string, I believe we don't need to worry about this case - I checked out several templates, including one with quote wrappers, everything was rendered as expected.
To go further, I tried to combine it with the preserveNewlineIndentation
helper function, but there is a bit of a problem because for this function matches are expected to be replaced with a newline character, whereas for removeTrailingSpaces
matches need to be replaced with an empty string.
const preserveNewlineIndentation = (textBlock: string): string => {
const stringWithNewLinesAndSpaces = /\n+\s*|\s+$/g;
return stringWithNewLinesAndSpaces.test(textBlock) ?
// what value should <replace-symbol> have here?
textBlock.replace(stringWithNewLinesAndSpaces, <replace-symbol>) :
textBlock;
}
Still, a bit clumsy and not pretty straightforward as long as regexes are involved. So at this point, my vote goes for two separate functions as a cleaner, and more human-friendly solution. Please let me know your thoughts.
@bandantonio Thank you for your detailed replies above.
I completely agree with you that using two separate helper functions is a lot simpler and keeps the codebase more readable. HOF's is probably overkill for what we're trying to achieve in this context.
I've pushed a new commit which uses the removeTrailingSpaces
helper along with tests for both removeTrailingSpaces
and preserveNewlineIndentation
.
Let me know what you think.
This issue is a continuation of the discussion from #28
Problem overview
All the highlights are accompanied by a so-called representative text that acts as a full version of the sentence from which the highlight is taken. Depending on a book structure and highlighted fragments, the representative text may contain excessive symbols, like trailing spaces, newlines, tabs, or a combination of them. This results in newlines within imported highlight blocks, that add confusion as it may seem like two separate highlight blocks. For example:
The initial solution was proposed in #28 and it acts as a good starting point. However, the solution doesn't cover following cases (yet):
along the way.
reboot?\n\n
project.\n\n\n
simple answers.\n\t\t
success.\n\t\t\t
instead.\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t
a book\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t
The proposed solution is similar to the function created for preserving indentation in text blocks: check
preserveNewlineIndentation
function in the src/methods/aggregateDetails.ts.Proposal
Below are options under discussion: