akosbalasko / yarle

Yarle - The ultimate converter of Evernote notes to Markdown
https://github.com/akosbalasko/yarle
MIT License
1.42k stars 81 forks source link

Feature: apply links based on Levensthein-distance #531

Closed akosbalasko closed 1 year ago

akosbalasko commented 1 year ago

matboehmer's great idea is, in order to increase the number of the recognizable chains between links two end, that Yarle could try to do it by calculating a Levensthein distance between the text of the link and the existing notes' title created and apply the link to the minimal one.

If more than one notes has minimal distance, based on another setting Yarle could do the followings:

  1. do not add a link to any of them (it results link loss)
  2. link to all of the notes (it results extra links which were not set in Evernote)
  3. link to the first of them (may result link mixtures)

As an MVP I would implement case 3.

matboehmer commented 1 year ago

Thanks! Really looking forward to this one. Happy to serve as a tester.

akosbalasko commented 1 year ago

Hi @matboehmer ! I've created a pre-release with this Levensthein-distance linking feature, feel free to download from here https://github.com/akosbalasko/yarle/releases/tag/v5.8.0 and test it. Thanks a lot!

matboehmer commented 1 year ago

Thanks, great! How can I run the code using npx or any other way? I am not sure if npx -p yarle-evernote-to-md@5.8.0 yarle --configFile config.json uses the latest code.

akosbalasko commented 1 year ago

@matboehmer yes yes, it should work as you wrote, just extend your config.json with a new property:

useLevenshteinForLinks: true
matboehmer commented 1 year ago

Thanks, got it! However, it does not work for me. It seems like applyLinks in apply-links.js is only called once and also the if (options.useLevenshteinForLinks) block is only called once (I added a console output for debugging). However, in the test set I posted in #530 there are 4 links. So, from my understanding the levenshtein lookup should also be done 4 times?

akosbalasko commented 1 year ago

hm... it is iterated through the recognized links and replaces the link URLs everywhere in the notes folder. Let me check.

akosbalasko commented 1 year ago

It's hard to create a real test for multiple links, the Evernote fails to sync for me currently. So it will take a bit of time, sorry.

akosbalasko commented 1 year ago

@matboehmer could you pls give it a try via the UI? thanks a lot!

matboehmer commented 1 year ago

Same result; also does not work using the app UI. Does it work for you? Do you have some test data you could share?

akosbalasko commented 1 year ago

@matboehmer ,

Yes yes, here is the enex I use for testing: https://github.com/akosbalasko/yarle/blob/master/test/data/test-levenshtein-links.enex And here are the two notes created: https://github.com/akosbalasko/yarle/blob/master/test/data/test-levenshtein-linksNoteA.md https://github.com/akosbalasko/yarle/blob/master/test/data/test-levenshtein-linksNoteB.md

matboehmer commented 1 year ago

It works for me with your data set, but not with the one I postet here https://github.com/akosbalasko/yarle/issues/530#issuecomment-1728543525

akosbalasko commented 1 year ago

I think that one, what you shared in the comment reflects a different issue which cannot be resolved easily. What i implemented is that if the the referenced note is recognized by its note text's shortest Levenshtein-distance. For instance if the text of the note is mistyped like notA is typed instead of noteA, and there is no notes that's name is more similar than this, then notA is going to be picked.

matboehmer commented 1 year ago

In my example data in https://github.com/akosbalasko/yarle/issues/530#issuecomment-1728543525 the wrong link is created as [[first-note|second note]] in both files first-note and second-note. However, the link [[first-note|second note]] could be fixed to [[second-note|second note]] (i.e., replacing first-note with second-note) by looking up a proper link target using Levensthein distance.

akosbalasko commented 1 year ago

@matboehmer , Okay, I found a bug around the unique id recognizer that caused that the links could overlap each other. Now it is fixed, I checked with your example, and as I see it fixes your issue, but please confirm. Thanks a lot!

matboehmer commented 1 year ago

Great, thank you! Works perfectly now on my test data set and already really good on my real data set. Thank you very much for adding this feature!