aisingapore / TagUI

Free RPA tool by AI Singapore
Apache License 2.0
5.45k stars 572 forks source link

Can tagui scrape from pdf file in the desktop application? #250

Closed sangasangasanga closed 5 years ago

sangasangasanga commented 5 years ago

Hey @kensoh Web Scraping works fine but does scraping works for pdf file in desktop applications? I read your closed queries and it states visual automation can be use for desktop applications #113 . But I don't think it can be used to scrape (correct me if I am wrong). Do suggest a way for me to go ahead with this.

kensoh commented 5 years ago

Ic.. One way is use visual automation to double-click and open the PDF.

After that use computer vision OCR to get the text into a variable. Something like below -

read page.png to text_on_screen
echo text_on_screen

2nd way is after opening, initiate a select all to copy and then pasting into notepad or somewhere else (will have to use vision step and custom Sikuli commands to do that - more info here).

3rd way is to use some other command line tools to convert the PDF to text. However, do note if OCR is used, even for commercial tools, the accuracy is not 100%, so it depends on your business workflow how much tolerance for errors is the process. If 100% accuracy is needed, people still end up having to check all records manually to look for the 5-10% errors. Then it is not good candidate for automation.

There isn't a best way to do the automation, it depends on the type of document and the type of process. Have you tried some other commercial tools such as UiPath Community Edition and WorkFusion RPA Express? UiPath that one is free for trial and evaluation, RPA Express is free for commercial use.

sangasangasanga commented 5 years ago

Hey @kensoh Sorry I am new to this so I am kind of confuse. I used Sikuli to visual automate and open the PDF. However, after that you said to use OCR to get the text into a variable click("1532589375522.png") click("1532589388724.png") doubleClick("1532589508997.png") wait("PTALL1aLLVLL.png") read PTALL1aLLVLL.png to screen_text

However, I am not sure where to run it after the OCR, TagUI or Sikuli? As both are showing error: TagUI C:\tagui\src>tagui tagui.sikuli\test.sikuli\test.py ERROR - use .tagui .js .txt or no extension for flow filename

Sikuli [error] script [ test ] stopped with error in line 7 at column 5 [error] SyntaxError ( "no viable alternative at input 'PTALL1aLLVLL'", )

kensoh commented 5 years ago

Try below, make sure Sikuli is installed following the visual automation section. I still don't recommend using TagUI as this involved OCR and not 100% accurate. Try the commercial tools, should be better but they are also not 100% accurate for OCR.

abc.txt

click 1532589375522.png
click 1532589388724.png
dclick 1532589508997.png
vision wait('FULL_PATH\PTALL1aLLVLL.png')
read PTALL1aLLVLL.png to screen_text
echo screen_text
tagui abc.txt
sangasangasanga commented 5 years ago

@kensoh after i run, after the step 1(click ..) it just hangs there and it doesn't click anything. Does it runn in the background?

C:\tagui\src>tagui abc.txt [starting sikuli process]

START - automation started - Fri Jul 27 2018 11:51:15 GMT+0800 (Malay Peninsula Standard Time)

click C:/tagui/src/1532589375522.png

I am trying to use visual automation to open doc. But can it also like scrape or find text in microsoft doc instead?

kensoh commented 5 years ago

Follow the steps here to make sure visual engine is set-up - https://github.com/kelaberetiv/TagUI#visual-automation

sangasangasanga commented 5 years ago

@kensoh I followed the steps in https://github.com/kelaberetiv/TagUI#visual-automation but it is still the same - just hangs there

kensoh commented 5 years ago

Can you paste the .log files in tagui\src\tagui.sikuli here?

sangasangasanga commented 5 years ago

@kensoh It is actually empty. Is something suppose to be there?

kensoh commented 5 years ago

Yes suppose to have something there. If nothing, maybe installation got problem. Is there a runsikulix file in that folder?

sangasangasanga commented 5 years ago

Nope, there are tagui.log, tagui.py(only not empty), tagui_sikuli.in, tagui_sikuli.out, tagui_sikuli.out, tagui_windows.log.The rest are empty

kensoh commented 5 years ago

If no runsikulix file in that folder it means not installed correctly. After Sikuli is installed correctly, there should be a runsikulix file there. Try to follow the steps again - https://github.com/kelaberetiv/TagUI#visual-automation

alketshabani commented 5 years ago

Hello, i am trying to open an application with tagui, but click is not opening the app (windows 10). Is there a possibility for double click?

kensoh commented 5 years ago

Yes you can use dclick for double click and rclick for right click with visual automation 👍