MarkBind / markbind

MarkBind is a tool for generating content-heavy websites from source files in Markdown format
https://markbind.org/
MIT License
135 stars 124 forks source link

Efficient validation for intra-link with hash #2465

Closed yiwen101 closed 6 months ago

yiwen101 commented 6 months ago

What is the purpose of this pull request?

working on #1418

Overview of changes: Store all the elements with ids (accessible) in the siteLinkManager Validate intra-link with hash at the link processor; Fixes 15 detected invalid intra-link with hash in the current documentation.


This is a draft PR; has encountered issue and stuck when working on this issue, so I post my work so far to seek help.

Current implementation: 1 for nodes with node.attribs.ids, simply add to the collection in the siteLinkManager

2 for header tags, add to the collection in the siteLinkManager after they have been granted ids 3 for include nodes, after they have been processed, recursively add their and their children ids to the collection in the siteLinkManager; if they/their children are header tags, grant them ids with the same util method as in 2

Screenshot 2024-03-16 at 10 36 58

Current issue: 1 some header added in step 3 seems to be off:

Screenshot 2024-03-16 at 10 22 30 Screenshot 2024-03-16 at 10 39 07

2 there are still some hashes missing, not collected:

Screenshot 2024-03-16 at 10 29 50

Anything you'd like to highlight/discuss:

Testing instructions:

Proposed commit message: (wrap lines at 72 characters) Implement efficient validation for hash intra-link


Checklist: :ballot_box_with_check:


Reviewer checklist:

Indicate the SEMVER impact of the PR:

At the end of the review, please label the PR with the appropriate label: r.Major, r.Minor, r.Patch.

Breaking change release note preparation (if applicable):

Give a brief explanation note about:

  • what was the old feature that was made obsolete
  • any replacement feature (if any), and
  • how the author should modify his website to migrate from the old feature to the replacement feature (if possible).
yiwen101 commented 6 months ago

It turns out that my code is good. All the 15 mismatches are actually legitimate broken links :(

I will fix all the broken links with hash in this PR as well. For ease of verification of reviewer, I will also post pictures of the positions of these broken hash links here:

Screenshot 2024-03-19 at 15 40 52 Screenshot 2024-03-19 at 15 44 06 Screenshot 2024-03-19 at 15 37 08

<img width="663" alt="Screenshot 2024-03-19 at 15 35 39" src="https://github.com/M

Screenshot 2024-03-19 at 15 31 30

arkBind/markbind/assets/121547057/c14aeb4a-e60e-42ed-98f5-92b96d25ebab">

Screenshot 2024-03-19 at 15 29 48 Screenshot 2024-03-19 at 15 28 08 Screenshot 2024-03-19 at 15 25 46 Screenshot 2024-03-19 at 15 22 36 Screenshot 2024-03-19 at 15 13 51
codecov[bot] commented 6 months ago

Codecov Report

Attention: Patch coverage is 63.63636% with 20 lines in your changes are missing coverage. Please review.

Project coverage is 51.11%. Comparing base (9da549c) to head (49236f8).

Files Patch % Lines
packages/core/src/html/SiteLinkManager.ts 54.16% 10 Missing and 1 partial :warning:
packages/core/test/unit/utils/utils.ts 60.00% 4 Missing :warning:
packages/core/src/html/linkProcessor.ts 76.92% 3 Missing :warning:
packages/core/src/html/headerProcessor.ts 60.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #2465 +/- ## ========================================== + Coverage 50.98% 51.11% +0.12% ========================================== Files 124 124 Lines 5305 5355 +50 Branches 1137 1152 +15 ========================================== + Hits 2705 2737 +32 - Misses 2311 2328 +17 - Partials 289 290 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tlylt commented 6 months ago

Efficient

Could you test with the cs2103T website to check the before & after timing of running markbind-serve/build over a large website, to validate whether this solution is in practice efficient?

yiwen101 commented 6 months ago

Efficient

Could you test with the cs2103T website to check the before & after timing of running markbind-serve/build over a large website, to validate whether this solution is in practice efficient?

Thanks for the comment, the following if the result on my end: valid-hash branch, intrasiteLinkValidation {enabled: true}

Screenshot 2024-03-20 at 10 49 51

valid-hash branch, intrasiteLinkValidation {enabled: false}

Screenshot 2024-03-20 at 10 52 29

master-branch, intrasiteLinkValidation {enabled: false}

Screenshot 2024-03-20 at 10 55 10

The additional cost should be negligible, and literally invisible (as master branch in theory should run faster when all other factor hold constant, but it turns out run the slowest, suggesting that even the fluctuation in runtime caused by other factors has significantly higher impact than the runtime contributed by the changes)

tlylt commented 6 months ago

The additional cost should be negligible, and literally invisible (as master branch in theory should run faster when all other factor hold constant, but it turns out run the slowest, suggesting that even the fluctuation in runtime caused by other factors has significantly higher impact than the runtime contributed by the changes)

👍

For your test run on 2103T website, how many valid intra-link hash errors were detected? Would be useful info for @damithc for follow-up broken link fixes.

yiwen101 commented 6 months ago

Sorry for getting back to the review late; was overwhelmed by a hackathon due Friday this week

@kaixin-hc Thank you for the careful review and suggestions. Could you help elaborate further on how could I add this to test site? Current implementation of the functional test seems only adequate of recursively comparing the files in the expected folder and the actual files during the build process. I do not know how to "expect" error logs. The PR that brings in the intra-link validation feature also did not add functional test.

@EltonGohJH @yucheng11122017 Thank you for the careful review; I have made the requested changes and added method document.

The only exception is the "print" method in SitelinkManager. Although it is most for test purpose, I believe that it is a necessary evils and the best resort among all choices, so should leave it as it is.

yiwen101 commented 6 months ago

@yucheng11122017 Thank you for your careful review and various suggestions on improving code quality.

In a later commit, I made following changes to existing methods to improve code quality. They are: 1 rename "processAndReturnHeadingId" back to "setHeadingId", and modify its behaviour accordingly. 2 remove the "forceWrite" from maintainFilePathToHashesMap 3 modify the maintainHashesForInclude accordingly.

I hope that the code becomes clearer after the changes.