Closed sangasangasanga closed 5 years ago
Ic.. One way is use visual automation to double-click and open the PDF.
After that use computer vision OCR to get the text into a variable. Something like below -
read page.png to text_on_screen
echo text_on_screen
2nd way is after opening, initiate a select all to copy and then pasting into notepad or somewhere else (will have to use vision step and custom Sikuli commands to do that - more info here).
3rd way is to use some other command line tools to convert the PDF to text. However, do note if OCR is used, even for commercial tools, the accuracy is not 100%, so it depends on your business workflow how much tolerance for errors is the process. If 100% accuracy is needed, people still end up having to check all records manually to look for the 5-10% errors. Then it is not good candidate for automation.
There isn't a best way to do the automation, it depends on the type of document and the type of process. Have you tried some other commercial tools such as UiPath Community Edition and WorkFusion RPA Express? UiPath that one is free for trial and evaluation, RPA Express is free for commercial use.
Hey @kensoh Sorry I am new to this so I am kind of confuse. I used Sikuli to visual automate and open the PDF. However, after that you said to use OCR to get the text into a variable click("1532589375522.png") click("1532589388724.png") doubleClick("1532589508997.png") wait("PTALL1aLLVLL.png") read PTALL1aLLVLL.png to screen_text
However, I am not sure where to run it after the OCR, TagUI or Sikuli? As both are showing error: TagUI C:\tagui\src>tagui tagui.sikuli\test.sikuli\test.py ERROR - use .tagui .js .txt or no extension for flow filename
Sikuli [error] script [ test ] stopped with error in line 7 at column 5 [error] SyntaxError ( "no viable alternative at input 'PTALL1aLLVLL'", )
Try below, make sure Sikuli is installed following the visual automation section. I still don't recommend using TagUI as this involved OCR and not 100% accurate. Try the commercial tools, should be better but they are also not 100% accurate for OCR.
abc.txt
click 1532589375522.png
click 1532589388724.png
dclick 1532589508997.png
vision wait('FULL_PATH\PTALL1aLLVLL.png')
read PTALL1aLLVLL.png to screen_text
echo screen_text
tagui abc.txt
@kensoh after i run, after the step 1(click ..) it just hangs there and it doesn't click anything. Does it runn in the background?
C:\tagui\src>tagui abc.txt [starting sikuli process]
START - automation started - Fri Jul 27 2018 11:51:15 GMT+0800 (Malay Peninsula Standard Time)
click C:/tagui/src/1532589375522.png
I am trying to use visual automation to open doc. But can it also like scrape or find text in microsoft doc instead?
Follow the steps here to make sure visual engine is set-up - https://github.com/kelaberetiv/TagUI#visual-automation
@kensoh I followed the steps in https://github.com/kelaberetiv/TagUI#visual-automation but it is still the same - just hangs there
Can you paste the .log files in tagui\src\tagui.sikuli here?
@kensoh It is actually empty. Is something suppose to be there?
Yes suppose to have something there. If nothing, maybe installation got problem. Is there a runsikulix file in that folder?
Nope, there are tagui.log, tagui.py(only not empty), tagui_sikuli.in, tagui_sikuli.out, tagui_sikuli.out, tagui_windows.log.The rest are empty
If no runsikulix file in that folder it means not installed correctly. After Sikuli is installed correctly, there should be a runsikulix file there. Try to follow the steps again - https://github.com/kelaberetiv/TagUI#visual-automation
Hello, i am trying to open an application with tagui, but click is not opening the app (windows 10). Is there a possibility for double click?
Yes you can use dclick for double click and rclick for right click with visual automation 👍
Hey @kensoh Web Scraping works fine but does scraping works for pdf file in desktop applications? I read your closed queries and it states visual automation can be use for desktop applications #113 . But I don't think it can be used to scrape (correct me if I am wrong). Do suggest a way for me to go ahead with this.