Open IngersolNorway opened 3 months ago
Not sure about a MS word macro. As we are linux users, we dont have Ms office.
Hope we can try for a libreoffice macro or google docs addon (javascript)
here is a sample tutorial on how to write google doc addon
https://developers.google.com/workspace/add-ons/editors/docs/quickstart/translate
I have tried but I am an facing an issue recording unicode range. I Have attached the VBA script and the output(which is not expected). Note : I dont know anything about VBA, I just tried. `Sub HighlightNonTamilCharacters() On Error Resume Next ' Continue on errors for debugging purposes
Dim doc As Document
Set doc = ActiveDocument
If doc Is Nothing Then
MsgBox "No active document found. Please open a document and try again.", vbExclamation
Exit Sub
End If
' Define the Tamil alphabet characters (Unicode range for Tamil script)
' Modify this range if needed based on your specific Tamil font and characters
Const TamilRangeStart As Long = &HB80
Const TamilRangeEnd As Long = &HBFF
Dim para As Paragraph
Dim rng As Range
Dim i As Long
For Each para In doc.Paragraphs
Set rng = para.Range
' Check if the range is not empty before processing
If Len(rng.Text) > 0 Then
For i = 1 To rng.Characters.Count
If Not IsCharTamil(rng.Characters(i).Text, TamilRangeStart, TamilRangeEnd) Then
rng.Characters(i).Font.Color = RGB(255, 0, 0) ' Highlight in red (RGB)
End If
Next i
End If
Next para
MsgBox "Highlighting complete.", vbInformation
Function IsCharTamil(char As String, startUnicode As Long, endUnicode As Long) As Boolean
Dim code As Long
code = AscW(char)
If code >= startUnicode And code <= endUnicode Then
IsCharTamil = True
Else
IsCharTamil = False
End If
End Function`Output
Observation : Instead of highlighting non tamil script it highlights tamil scripts,Most probably it happens due to unicode range and in the loop initialization
HIGHLIGHT ALL EXCEPT THESE (ஃ, அ, ஆ, இ, ஈ, உ, ஊ, எ, ஏ, ஐ, ஒ, ஓ, ஔ, க், க, கா, கி, கீ, கு, கூ, கெ, கே, கை, கொ, கோ, கௌ, ங், ங, ஙா, ஙி, ஙீ, ஙு, ஙூ, ஙெ, ஙே, ஙை, ஙொ, ஙோ, ஙௌ, ச், ச, சா, சி, சீ, சு, சூ, செ, சே, சை, சொ, சோ, சௌ, ஞ், ஞ, ஞா, ஞி, ஞீ, ஞு, ஞூ, ஞெ, ஞே, ஞை, ஞொ, ஞோ, ஞௌ, ட், ட, டா, டி, டீ, டு, டூ, டெ, டே, டை, டொ, டோ, டௌ, ண், ண, ணா, ணி, ணீ, ணு, ணூ, ணெ, ணே, ணை, ணொ, ணோ, ணௌ, த், த, தா, தி, தீ, து, தூ, தெ, தே, தை, தொ, தோ, தௌ, ந், ந, நா, நி, நீ, நு, நூ, நெ, நே, நை, நொ, நோ, நௌ, ப், ப, பா, பி, பீ, பு, பூ, பெ, பே, பை, பொ, போ, பௌ, ம், ம, மா, மி, மீ, மு, மூ, மெ, மே, மை, மொ, மோ, மௌ, ய், ய, யா, யி, யீ, யு, யூ, யெ, யே, யை, யொ, யோ, யௌ, ர், ர, ரா, ரி, ரீ, ரு, ரூ, ரெ, ரே, ரை, ரொ, ரோ, ரௌ, ல், ல, லா, லி, லீ, லு, லூ, லெ, லே, லை, லொ, லோ, லௌ, வ், வ, வா, வி, வீ, வு, வூ, வெ, வே, வை, வொ, வோ, வௌ, ழ், ழ, ழா, ழி, ழீ, ழு, ழூ, ழெ, ழே, ழை, ழொ, ழோ, ழௌ, ள், ள, ளா, ளி, ளீ, ளு, ளூ, ளெ, ளே, ளை, ளொ, ளோ, ளௌ, ற், ற, றா, றி, றீ, று, றூ, றெ, றே, றை, றொ, றோ, றௌ, ன், ன, னா, னி, னீ, னு, னூ, னெ, னே, னை, னொ, னோ, னௌ, ஜ், ஜ, ஜா, ஜி, ஜீ, ஜு, ஜூ, ஜெ, ஜே, ஜை, ஜொ, ஜோ, ஜௌ, ஷ், ஷ, ஷா, ஷி, ஷீ, ஷு, ஷூ, ஷெ, ஷே, ஷை, ஷொ, ஷோ, ஷௌ, ஸ், ஸ, ஸா, ஸி, ஸீ, ஸு, ஸூ, ஸெ, ஸே, ஸை, ஸொ, ஸோ, ஸௌ, ஹ், ஹ, ஹா, ஹி, ஹீ, ஹு, ஹூ, ஹெ, ஹே, ஹை, ஹொ, ஹோ, ஹௌ, க்ஷ், க்ஷ, க்ஷா, க்ஷி, க்ஷீ, க்ஷு, க்ஷூ, க்ஷெ, க்ஷே, க்ஷை, க்ஷொ, க்ஷோ, க்ஷெள, ஸ்ரீ, ௐ)
Range: 0B80–0BFF includes broken unicode text. We can use this
Can someone assist me by writing a script to highlight all characters in Microsoft Word that are not part of the 247 Tamil alphabet? This will help us clean auto-generated OCR Tamil books.
like below