ServiceNow / BrowserGym

BrowserGym, a gym environment for web task automation in the Chromium browser.
Other
265 stars 31 forks source link

BrowserGym IDs for Static Text #101

Closed suhamansuri22 closed 3 weeks ago

suhamansuri22 commented 1 month ago

I am using browsergym on my custom platform, and anytime I have a dropdown that has HTML tag div/span, it is not given a browsergym id. Thus, the model cannot interact with any of the drop down elements as they are considered static text elements. Is there a way to change this so that browsergym can assign id's to static text elements, as on my end they are considered 'interact-able'

gasse commented 1 month ago

Hi @suhamansuri22 , we only tag standard html elements at the moment as this was causing issues on some websites before, but since we made the tagging routine more robust (check for duplicate bids) I guess we could re-activate tagging all elements. Elements such as div and span should get a bid though. Could you share a portion of your web page, with the elements for which you would like to have a bid?

suhamansuri22 commented 1 month ago

for some reason my elements that have div and span get bids assigned to them when you inspect my page, but the axtree that is being passed to the model does not have bids assigned to the dropdown options. these are dropdowns that are characterized by div and span, but they do not get bids in the axtree.

Screenshot 2024-07-24 at 9 49 23 AM
gasse commented 1 month ago

The culprit might be the flatten_axtree_to_str() method then. The bids are printed at this line https://github.com/ServiceNow/BrowserGym/blob/e46944300210cf992ac7563860f2b6d1af76cc2d/core/src/browsergym/utils/obs.py#L394

I'd suggest to run your tasks in debug mode with breakpoints in flatten_axtree_to_str(), and try to figure out why your elements don't get a bid.

suhamansuri22 commented 1 month ago

Hi! Thank you so much for this it was very helpful! Could I take this further and ask, is there any way to detect if a Static text element is part of "ant dropdown content". The way our page is designed is that dropdown contents have static text elements that hold their values, and thus they are needed to identify the correct element to select. I don't want to assign evert static text element with a bid, as that overloads the axtree and impacts the model's behavior. thanks in advance!

gasse commented 3 weeks ago

Hi @suhamansuri22 , you should be able to achieve what you want by copy-pasting and adapting the flatten_axtree_to_str() method, which will allow you to process and render StaticText elements as you wish from the full AXTree object. If there is anything specific you are not able to do from the AXTree object and you feel it should be supported by browsergym, please open a new issue describing the feature you'd like to have. Best of luck with your code