TagUI for Desktop Applications --> use visual automation

ArulKarthickKuppusamy commented 6 years ago

Hi Ken,

We are trying to Implement TagUI for automating Desktop Applications. It would be great if we have some documents/videos related to automating desktop applications.

If Possible please share the details to arulkarthick@rocketmail.com

Thanks, Arul

kensoh commented 6 years ago

Hi Arul, thanks for asking this. TagUI relies on visual recognition to automate desktop applications.

More details here. Steps that support visual automation are click, hover, type, select, read, show, save, snap. For example, below automation flow tries to send an email through Outlook, by looking for best matches of images of respective UI elements.

Attaching the sample images for reference - samples.zip. They won't work on Windows Outlook (or macOS Outlook of different versions), as the UI icons will look different for different OS and versions. But will be good to see to get a better idea how visual recognition is used to control UI actions.

Helper function visible() can also be used to detect whether an image is visible.

click outlook-icon.png
click new-email.png
enter mail-body.png as Hi Whoever,\n\nAttached are the M1 numbers.\n\nRegards,\nKen
enter subject-field.png as M1 Lucky Numbers
enter to-field.png as ksoh@aisingapore.org
click attach-button.png
click numbers-icon.png
click choose-button.png
click send-button.png

Below is another example of visual automation

Using OCR to grab text from PDF (alternatively use Python libraries or other CLI tools), followed by typing and printing thank-you letter from MS Word. Attached images - word_samples.zip

click minimize.png  
dclick receipt.png
wait 2 seconds
read page.png to receipt_text
write receipt_text to receipt.txt
click close_pdf.png

dclick letter.png
wait 8 seconds
dclick address.png
type page.png as John Lim[enter]123 ABC Street[enter]Singapore 1234567[clear]
dclick name.png
type page.bmp as John
dclick amount.png
type page.png as $123.00
click file.png
click print.png
click confirm.png
click close_word.png
click dontsave.png

kensoh commented 6 years ago

Attaching the sample images for reference. They won't work on Windows Outlook (or macOS Outlook of different versions), as the UI icons will look different for different OS and versions. But will be good to see as samples to get a better idea how visual recognition is used to control UI actions - samples.zip

kensoh commented 6 years ago

Re-posting a great comment from @adegard here since this issue is related to AHK and is still open.

Thank you @kensoh for your answer. I'm a beginner devlopper so virtual display seems to me a little bit complicate for now...

Aboout AutoHotKey, I would like to make a little tool for editing Tagui script if it is possible... Can I share with you a repository to work on it? it is a little menu to remember mains comands in english (activated by using crtl+left click), it is not completed, but I could share it to you and other user: https://github.com/adegard/tagui_scripts

I read about AI Singapore and other blogs on RPA, it seems that for beginners UIpath is a bit complicate and RPA express too much big program to install... So in my opinion Tagui is a very good alternative, simple and leight. Please continue your project, even it's so hard to maintain ;-)

kensoh commented 6 years ago

Hi @adegard wow looks cool! I've just tried out your AHK TagUI commands helper. I think I have to create a new section on TagUI home page to link to tools and stuffs that the community create 😄

PS - thanks very much for your feedback and encouragement! Yes TagUI will continue to be maintained to make RPA accessible to a broader user community than large organizations with deep pockets.

ahk_helper

adegard commented 6 years ago

OK thank @kensoh so I will complete the helper tool.. for my personal use. I need it to don't remember all commands in 6 months!! so I will copy all your example in it to render it more friendly.

I ' m not using TagUI on server but on my personal PC, so for me headless script combines with cron (like z-cron tool) is very important! but at the same time, I need some tool to "accelerate" the process of script production, because we have a lot of things to automatize... ! I will do my best to complete the ahk script! thanks again

kensoh commented 6 years ago

I'm only looking at 3 items in pipeline for TagUI before hitting maintenance mode.

integrating with desktop apps - https://github.com/kelaberetiv/TagUI/issues/113
assistant for writing scripts - https://github.com/kelaberetiv/TagUI/issues/188
for loop break and continue - https://github.com/kelaberetiv/TagUI/issues/216

May reach out to other open-source RPA software maintainers to look at collaboration. Was thinking yesterday if can make a great open-source RPA tool and pass on to @microsoft or another large tech company to maintain, can put pressure on commercial RPA tools to raise the quality and ease-of-use of their free versions. That should lead to the largest impact on the RPA ecosystem.

kensoh commented 6 years ago

Besides the example above on outlook, using vision step, users can send custom commands to Sikuli to do things like typing complex keystroke sequences. There also seems to be a trend towards using computer vision for UI automation of desktop apps. This is happening for commercial RPA software and also startups such as http://www.intellibot.io.

Furthermore, I can't see a sensible way to harmonize the steps API for AutoHotkey or RoroScript with TagUI. They are all different powerful tools, but to try to force an integration for the sake of integrating is senseless. Users will be better off writing the automation flows directly in those software and using run step or api step to invoke those part of the automation, if they still want to manage the whole flow from within TagUI.

Because of this, have decided to abandon efforts on trying to integrate natively with AHK or RoroScript but instead use the effort to review possible ways to improve Sikuli's visual automation integration. Folks who want integration with desktop apps, just give a shout here your use scenarios and let's see what can be done to run those automation workflows using TagUI-Sikuli's native integration.

CC @Aussiroth @lohvht - we can discuss next week some examples of use scenarios for desktop apps, and explore ways to make it easy + accurate to run visual automation on them.

kensoh commented 6 years ago

1 idea is make it super simple to create customized workflows for different desktop apps. For eg, having a 'module' for excel 20XX, a 'module' for outlook 20XX. Where each module is nothing more than folders with images of UI elements that we can either create ourselves or let users submit as PRs.

And perhaps coupled with that some automation flows that can be called via tagui steps to do some action. eg tagui excel/create_new_sheet (that also means tagui step need to support sending parameters as part of the step). @adegard's screen-capture tool will come in very handy :smile:

kensoh commented 6 years ago

above commit adds visual automation for type page.png as text

supports [enter] and [clear] keywords just like the standard type step for webpages
trigger word is page.png and page.bmp, just like the steps snap, read, show, save

prior to this, type step can only type into an UI element on screen, eg type search_bar.png as 123

kensoh commented 6 years ago

Have looked through sikuli's doc. can't find anything else that should be implemented directly as part of tagui steps. for those niche custom commands, vision step can be used - more details of sikuli commands here - http://doc.sikuli.org and here - http://sikulix-2014.readthedocs.io/en/latest

Closing the issue for now, the screen capture utility to facilitate capturing image snapshots can be done as part of #188. The modules idea above is worth exploring when the time is ripe (for community contributed images of elements). also copying @Aussiroth @lohvht for further inputs.

kensoh commented 6 years ago

User question - just to clarify, what is the page.png? and also what does the highlighted codes mean?

click minimize.png
dclick receipt.png wait 2 seconds read page.png to receipt_text write receipt_text to receipt.txt click close_pdf.png

dclick letter.png wait 8 seconds dclick address.png type page.png as John Lim[enter]123 ABC Street[enter]Singapore 1234567[clear] dclick name.png type page.bmp as John dclick amount.png type page.png as $123.00 click file.png click print.png click confirm.png click close_word.png click dontsave.png

My reply

For visual automation, TagUI looks out for .png or .bmp names instead of element identifiers referring to webpage UI (user-interface) elements.

read page to xxx normally means read text contents of the webpage to variable xxx. read page.png to receipt_text uses visual recognition and OCR (optical character recognition) to read the text on whole screen to the variable receipt_text. it's trying to capture the text from the PDF file to save into a text file.

More details of the visual automation here - https://github.com/kelaberetiv/TagUI#visual-automation

write receipt_text to receipt.txt saves the variable to a text file receipt.txt

More details of all the TagUI steps here - https://github.com/kelaberetiv/TagUI#steps-description

vijendra-impetus commented 6 years ago

@kensoh ,

I tried the below steps:

dclick receipt.png wait 2 seconds read page.png to receipt_text write receipt_text to receipt.txt click close_pdf.png

But after below message it got stuck nothing happening I tried several times by clicking on different images in folder but its not clicking on any image.

tagui D:\TagUI_Windows\word_samples\pdfread [starting sikuli process]

START - automation started - Thu Jul 05 2018 15:17:15 GMT+0530 (India Standard Time)

click D:/TagUI_Windows/word_samples/confirm.png

kensoh commented 6 years ago

Hi @vijendra-impetus recently a user has a similar problem when using the visual automation on Windows. It just hangs after running, even when Sikuli and Java has been installed.

This is the solution that works for her, see here to see if it helps your situation - https://github.com/kelaberetiv/TagUI/issues/229

If not, can you paste the contents of the tagui_windows.log file in src\tagui\tagui.sikuli here to see what is the error messages in backend?

vijendra-impetus commented 6 years ago

Hi @kensoh ,

As per the solution the logs were printing in log files are like below :

+++ running this Java java version "1.8.0_171" Java(TM) SE Runtime Environment (build 1.8.0_171-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) +++ trying to run SikuliX +++ using: -Xms64M -Xmx512M -Dfile.encoding=UTF-8 -Dsikuli.FromCommandLine -jar c:\tagui\src\tagui.sikuli\sikulix.jar -r tagui.sikuli Jul 03, 2018 11:40:41 AM java.util.prefs.WindowsPreferences WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. [tagui] START - listening for inputs

[tagui] FINISH - stopped listening

But when I check the log file as mentioned by you at location src\tagui\tagui.sikuli , the log file (tagui_windows.log) is completely empty. Nothing is there in log file.

kensoh commented 6 years ago

Hi @vijendra-impetus can you check inside the tagui\src\tagui.sikuli folder, is there a runsikulix file? That file should be there is installation is completed.

For installation, see these steps (now in midst of updating visual automation in main documentation to have these details in tutorial) - https://github.com/kelaberetiv/TagUI/blob/master/src/media/RPA%20Workshop.md#visual-automation

mathiasx88 commented 5 years ago

Hi Kenson,

Recently i trying out the tagui web automation chrome extension, after the script are generated, when i try to run the script, it indicate that the web element are not found. Can assist to advice. Some of the element are able to get a response, but some does not.

https://www.google.com/ click .gLFyf.gsfi enter .gLFyf.gsfi as github[enter] click .aajZCb input:nth-child(1) click .bkWMgd:nth-child(1) .LC20lb

kensoh commented 5 years ago

Hi @mathiasx88 the recording is not foolproof, you can try using XPath by inspecting directly from your web browser, for example - https://github.com/kelaberetiv/TagUI#find-xpath-of-web-element

After you copy XPath using the example in the link above, you can perform TagUI actions on the element using the familiar steps. Besides copying from browser, it is a good investment to learn XPath and writing your own XPath locator. It is very expressive and very useful for selecting web elements.

mathiasx88 commented 5 years ago

Hi, @kensoh

I am able to run the tagui script directly from command prompt now. But when i try to run for firefox, i will encounter error. Below are the screenshot of the error. Will you be able to advice? Thanks alot.

kensoh commented 5 years ago

Hi @mathiasx88 yes Firefox has an overhaul from v60 and SlimerJS is not compatible yet. More details here on using Firefox (for eg using older version or automating it visually) - https://github.com/kelaberetiv/TagUI/issues/344#issuecomment-465827145

mathiasx88 commented 5 years ago

Hi @kensoh

May i check, i tried to use the following command to clear the text field that come with default value 65, but whenever i run the command, it does not clear. Can assist to advice.

type /html/body/div/section[4]/div/div[2]/div/form[1]/div[13]/div[1]/input as [clear]8938392[enter]

kensoh commented 5 years ago

It might be the XPath is wrong or other reasons, but hard to take a look without replication steps.

oai1228 commented 5 years ago

@kensoh Hi I have some questions

I wrote code below: dclick /Users/desktop nate.png wait 3 snap page snap logo snap page as nate_sample.png snap logo as nate_sample2.png wait 3

In cmd, START - automation started - Wed Oct 16 2019 15:05:45 GMT+0900 (??쒕?援??쒖???

dclick /Users/議곗슦??desktop nate.png ....

does not working well Can you explain about visual automation, and why that code dose not working

kensoh commented 5 years ago

Hi @oai1228 I think there cannot a space in the file name - dclick /Users/desktop nate.png Try using something simple like nate.png without space to see if it works.

Visual automation requires Java SDK (64-bit), see here for details - https://github.com/kelaberetiv/TagUI#visual-automation

Finally, check the log files in tagui/src/tagui.sikuli folder to see what is the error message.

oai1228 commented 5 years ago

Hi ken, I have problem again,

START - automation started - Tue Oct 22 2019 17:24:52 GMT+0900 (??쒕?援??쒖??? dclick c:/TagUI/tagui/src/samples/ever.png

and then cmd dose not work. I already download Java SDK (64-bit)

and I can't understand , Where do program click on the picture?

kensoh commented 5 years ago

Thanks @oai1228, looks like no other users have encountered this problem before, some next steps to try -

take image of your windows start button and name it as start.png
in your automation script, write one line click start.png
run automation and check the log file in tagui\src\tagui.sikuli

yoga212121 commented 4 months ago

hey @kensoh is it possible that my flow is entering login credentials on some webpage simultaneously while i am writing a a report on another site, or does it have to be undisturbed during the flow, is it possible for the website actions such as click type etc to run in background while i am performing some other operation

aisingapore / TagUI

TagUI for Desktop Applications --> use visual automation #113