Refactor: make helper functions more universal/modular

bandantonio commented 3 months ago

This issue is a continuation of the discussion from #28

Problem overview

All the highlights are accompanied by a so-called representative text that acts as a full version of the sentence from which the highlight is taken. Depending on a book structure and highlighted fragments, the representative text may contain excessive symbols, like trailing spaces, newlines, tabs, or a combination of them. This results in newlines within imported highlight blocks, that add confusion as it may seem like two separate highlight blocks. For example:

- 📖 Chapter:: N/A
- 🔖 Context:: əˈtämik
an extremely small amount of a thing; the single irreducible unit of a larger system.

- 🎯 Highlight:: əˈtämik
an extremely small amount of a thing; the single irreducible unit of a larger system.
- 📝 Note:: Link to atomic notes

The initial solution was proposed in #28 and it acts as a good starting point. However, the solution doesn't cover following cases (yet):

Trailing spaces - along the way.
Newlines and tabs
- reboot?\n\n
- project.\n\n\n
- simple answers.\n\t\t
- success.\n\t\t\t
- instead.\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t
- a book\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t

The proposed solution is similar to the function created for preserving indentation in text blocks: check preserveNewlineIndentation function in the src/methods/aggregateDetails.ts.

Proposal

Below are options under discussion:

Make the new helper more universal/modular to cover all the above mentioned cases. Modularity will keep the helper easy maintain and test, but may result in reduced code readability because if a text block requires multiple modifications, there will be a chain of helper functions performing these modifications.
Make a god-like helper function that takes all the helper functions and a text block as parameters and performs all the required modifications. As a result, there will be only one helper function to call but it may be harder to maintain and test this function with multiple "side effects".
More elegant solution to take the best from both worlds ?

bandantonio commented 3 months ago

Replying to the comment from @absorpheus

Are these trailing spaces, newlines and tabs appearing only at the end of the highlighted text block (contextualText) or do they appear in other places too? e.g. in between highlighted paragraphs

They can be both inside the contextualText and at the end of it. My initial thought was to concentrate on the "end of a block" case, as it should be easier to catch and more relevant in terms of our initial goal. I want to play around with regular expressions, specifically, word boundaries and see whether it helps.

absorpheus commented 3 months ago

I pushed a commit to this branch containing tests which demonstrate handling the edge cases stated in the problem overview above using both preserveNewLineIndentation and removeAllLastNewlines helper functions.

Let me know if this meets the requirements in terms of functionality and if there's anything that I've missed. My thinking is to first prove we can handle these edge cases and only then begin to refactor to a more elegant solution. I look forward to seeing what you come up with in terms of regex and word boundaries.

absorpheus commented 3 months ago

I pushed a commit to this branch

I created a higher-order function (HOF) createTextBlockHandlerChain, that takes in any number of TextBlockHandler's as parameters and returns a function that takes a string (text block) as a parameter and applies all the handler functions to it in sequential order, passing the output of the previous handler invocation as input to the next handler:

https://github.com/bandantonio/obsidian-apple-books-highlights-plugin/blob/fbdc9be1b7a96be00b156c2fd6af780cdde0fc1c/src/utils/createTextBlockHandlerChain.ts#L1-L9

TextBlockHandler is a function type that takes in a string and returns a string:

https://github.com/bandantonio/obsidian-apple-books-highlights-plugin/blob/fbdc9be1b7a96be00b156c2fd6af780cdde0fc1c/src/types.ts#L39

We can use createTextBlockHandlerChain to chain multiple text block handlers to create a new function handleTextForContext like this:

https://github.com/bandantonio/obsidian-apple-books-highlights-plugin/blob/fbdc9be1b7a96be00b156c2fd6af780cdde0fc1c/src/utils/handleTextForContext.ts#L6-L9

And then use handleTextForContext as follows:

https://github.com/bandantonio/obsidian-apple-books-highlights-plugin/blob/fbdc9be1b7a96be00b156c2fd6af780cdde0fc1c/src/methods/aggregateDetails.ts#L35

I also created a helper function createTextBlockHandler which takes a regular expression and the replacement text as parameters. This way we don't have to repeat the function logic each time we create a handler:

https://github.com/bandantonio/obsidian-apple-books-highlights-plugin/blob/fbdc9be1b7a96be00b156c2fd6af780cdde0fc1c/src/utils/createTextBlockHandler.ts#L1-L5

We can then use createTextBlockHandler as follows:

https://github.com/bandantonio/obsidian-apple-books-highlights-plugin/blob/fbdc9be1b7a96be00b156c2fd6af780cdde0fc1c/src/utils/preserveNewlineIndentation.ts#L1-L7

I think this way we can test each handler function separately and test handleTextForContext too (and other functions which chains multiple handlers together).

bandantonio commented 3 months ago

@absorpheus Thanks for the POC, it looks great and I do like this approach. I also spent some time with chatGPT elaborating more on this idea, and high-order functions is one reasonable way to go.

But I also had a second thought about the solution in general. Higher-order functions add another layer of complexity (and even may be considered a kind of an advanced technique), however, considering the overall simplicity of the current codebase, I would rather avoid adding this complexity for as long as possible, because I want to keep the codebase human-friendly and easy to understand.

Also, considering that currently we only have two cases with 1) preserving indentation and 2) removing trailing newlines and spaces, I tend to think that it would be over-engineering for these pretty simple use cases, at least for now. Plus, we don't know (yet) is there something else in text blocks that should be fixed, so potentially, we can end up with those two cases only.

So my suggestion is to proceed with separate helper functions for as long as we can, chaining a few of them (up to 3) wouldn't be a problem for now. And let's wait for the users' feedback to understand whether we need to iterate further on this. And your PR along with this issue will be placed on hold until the next round of discussion.

P.S: Please, don't think that I'm rejecting your solution, or discarding your efforts. Absolutely not. I was the initiator of this discussion to validate the idea and understand possible solutions. And I'm just trying not to jump into the boat of unjustified code complexity too early without proper confirmation of whether it makes sense now.

bandantonio commented 3 months ago

Let me know if this meets the requirements in terms of functionality and if there's anything that I've missed. My thinking is to first prove we can handle these edge cases and only then begin to refactor to a more elegant solution. I look forward to seeing what you come up with in terms of regex and word boundaries.

@absorpheus No objections from my side on the chosen approach. I played around with word boundaries, but eventually end up with just /\s+$/ as \s matches exactly what we need - any space, tab or newline character. Boundaries, on the other hand, have more edge cases to keep in mind. So the helper function would look like this:

const removeTrailingSpaces = (textBlock: string): string => {
    const endLineSpaces = /\s+$/;

    return endLineSpaces.test(textBlock) ? textBlock.replace(endLineSpaces, '') : textBlock;
}

Following up your question and my initial response about newlines inside a string, I believe we don't need to worry about this case - I checked out several templates, including one with quote wrappers, everything was rendered as expected.

To go further, I tried to combine it with the preserveNewlineIndentation helper function, but there is a bit of a problem because for this function matches are expected to be replaced with a newline character, whereas for removeTrailingSpaces matches need to be replaced with an empty string.

const preserveNewlineIndentation = (textBlock: string): string => {
    const stringWithNewLinesAndSpaces = /\n+\s*|\s+$/g;

    return stringWithNewLinesAndSpaces.test(textBlock) ?
        // what value should <replace-symbol> have here?
        textBlock.replace(stringWithNewLinesAndSpaces, <replace-symbol>) :
        textBlock;
}

Still, a bit clumsy and not pretty straightforward as long as regexes are involved. So at this point, my vote goes for two separate functions as a cleaner, and more human-friendly solution. Please let me know your thoughts.

absorpheus commented 3 months ago

@bandantonio Thank you for your detailed replies above.

I completely agree with you that using two separate helper functions is a lot simpler and keeps the codebase more readable. HOF's is probably overkill for what we're trying to achieve in this context.

I've pushed a new commit which uses the removeTrailingSpaces helper along with tests for both removeTrailingSpaces and preserveNewlineIndentation.

Let me know what you think.

bandantonio / obsidian-apple-books-highlights-plugin

Refactor: make helper functions more universal/modular #30

Problem overview

Proposal