CPkobo / vba-sequencematcher

Text diff scripts in VBA (Word), partially ported from Python SequenceMathcer
Other
4 stars 0 forks source link

Repeats and mispelled words after using WordSeqApplyer #1

Open justin13601 opened 1 year ago

justin13601 commented 1 year ago

I'm having some issues with WordSeqApplyer - it seems to produce outputs that have repeats and misspelt words. Is there anyway to improve the way the script uses the opcodes?

justin13601 commented 10 months ago

@CPkobo just following up - any ideas?

CPkobo commented 10 months ago

Could you give me a sample text?

justin13601 commented 10 months ago

Could you give me a sample text?

Of course - thank you for replying @CPkobo! Please see below the macro-free word document with sample text:

WordSeqApplyerCopy.docx

As you can see, I have original text: Original

I want to change it to edited text: Edited

But the result is this, where there are repeats and errors highlighted in yellow: Result

Is this because of the lack of junk function? I also see areas where the tracked changes is strange, like it's taking letters from words to form a new word - is this related?: Letters

I checked the opcodes and notice that the index position decrease each time there is a strange error, but I'm not sure how to fix it - any guidance would be appreciated!

justin13601 commented 10 months ago

@CPkobo I just tried computing opcodes through python's difflib and it seems like the opcodes are really similar to the VBA script when autojunk=False. However, more concise opcodes occur when autojunk=True. So maybe this is happening because junk is not implemented?

Do you have plans of adding it? If so, is there anyway I can help?

If not, is it possible to use the opcodes computed from python and transfer it to vba?