Open centopw opened 11 months ago
@centopw Thanks for this proposed change. It's interesting to see that you can just open the default browser by searching for "browser" in Mac OS. Do you have any ideas on how the default browser could be opened on Windows and Linux? I've tested just searching for "browser" on my Linux distro and it doesn't find the default.
When searching for browsers on different Linux distros, the current behavior is as follows:
Returns all available browsers but fails to display the correct default browser.
Similar to Ubuntu, it shows all available browsers but does not identify the default browser correctly.
Two potential solutions have been considered:
Script Improvement (PR #19):
Enhance the existing scripts to prompt the user for their default browser choice and update the main.py
with the selected browser.
Update main.py: Modify main.py to prompt the user to select the default browser every time it runs.
Both options offer improved accuracy:
Drawbacks:
Option 1:
Option 2:
With this proposal I have draft a simple update for the main.py as below:
# Ask the user for their default browser
default_browser = prompt(
"Please enter your default browser (e.g., Chrome, Firefox): "
)
# Adjust the behavior based on the user's default browser
if default_browser.lower() == "chrome":
browser_prompt = "Google Chrome"
browser_address_bar = {"x": "50%", "y": "9%"}
elif default_browser.lower() == "firefox":
browser_prompt = "Mozilla Firefox"
browser_address_bar = {"x": "50%", "y": "10%"}
else:
# Default to Chrome behavior if the input is unknown
browser_prompt = "Google Chrome"
browser_address_bar = {"x": "50%", "y": "9%"}
message_dialog(
title="Self-Operating Computer",
text=f"Ask a computer to do anything. Default browser set to {browser_prompt}.",
style=style,
).run()
print("SYSTEM", platform.system())
# Update the prompts based on the chosen/default browser
VISION_PROMPT = f"""
You are a Self-Operating Computer. You use {browser_prompt} as your default browser.
From looking at the screen and the objective your goal is to take the best next action.
To operate the computer you have the four options below.
1. CLICK - Move mouse and click
2. TYPE - Type on the keyboard
3. SEARCH - Search for a program on {browser_prompt} and open it
4. DONE - When you completed the task respond with the exact following phrase content
Here are the response formats below.
1. CLICK
Response: CLICK {{ "x": "percent", "y": "percent", "description": "~description here~", "reason": "~reason here~" }}
2. TYPE
Response: TYPE "value you want to type"
2. SEARCH
Response: SEARCH "app you want to search for on {browser_prompt}"
3. DONE
Response: DONE
Here are examples of how to respond.
...
"""
Also Instead of asking user to type out we can incorporate a menu function that allow user to select a pre-define selection of browser
@centopw Interesting. I think the ideal solution would be to just automatically detect the default browser if possible. On Windows, I'm pretty sure this can just be read from the registry using OpenKey. For Linux, this would probably be found in xdg-settings. I'm not sure about Mac OS. It would probably require some special permissions to access that system setting. If no default browser was found, it could just default to searching for "browser" or something. What do you think about this approach?
If you want to go with terminal approach we could simply open any website then from the terminal ex:
xdg-open http://www.google.com
start http://www.google.com
open http://www.google.com
When run this command in the terminal it will automatically open with default browser on each system. One more thing that I think we could benefit from this is since it always open the google.com website so we can define where the search location is avoid miss click even more
@centopw That's an interesting approach. However, the project is aiming more towards only giving the model control over the OS via mouse movements, mouse clicks, key-presses, and search operations (from key-presses). Running xdg-open, start, or open from the code itself would violate that vision (restricting the model to only have the same inputs to the OS as a human: mouse and keyboard).
So, having the model open a terminal and run xdg-open using only the cursor and key-presses would be a valid operation (although not very practical). Running xdg-open from the python code itself wouldn't be valid. Hope that makes sense.
The program should probably follow this order of operations:
Get name of the user's default browser (either manually or automatically) -> Give default browser name to model in prompt -> Model references default browser name to be included in the search action.
@centopw I am going to try out your install script in #19 and see how it works.
@michaelhhogue Then how about this? I don't really work with Windows that much so this draft only work with Mac using webbrowser and Linux xdg-setting,
def get_default_browser_macos():
return webbrowser.get().name
def get_default_browser_linux():
result = subprocess.run(["xdg-settings", "get", "default-web-browser"], stdout=subprocess.PIPE, text=True)
browser_name = result.stdout.strip()
return browser_name
@centopw I'll test this out as well and get back with you.
What if browser is already open?
@Kreijstal For now I don't think if the browser open effect anything. But that is an interesting ideas I will play around with it and let you know.
@centopw Just noting here that I haven't yet tested any default browser checking. Want to first see what happens with #19.
Problem
Currently, the application is prompt to use Google Chrome by default, limiting accessibility and user experience for individuals using alternative browsers. This monolithic approach excludes a significant user base and hinders the platform's adaptability to diverse browser environments.
Proposal
This issue advocates for a transition from Chrome-centric development to a more inclusive approach that supports a broader range of web browsers. The goal is to enhance accessibility, improve user experience, and adhere to web standards that promote compatibility across different platforms.
Proposed Changes
When testing I realize that on MacOS you can open your default browser by just type in the search bar
browser
So instead of
Google Chrome
you can search browsers then enter it will open the browser without the need of user have to use Google Chrome. Since most browser have the search bar at the same location you can still use the default setting for it.
I originally hacked in Google Chrome as the default, but agree we've out grown this. Chrome is 70% of the market if I understand correctly though. Would it make sense to "check for chrome" and if it doesn't find it then search for "browser" as shown above?
- Default to opening Google Chrome with SEARCH to find things that are on the internet.
With this proposal I have draft a simple update for the main.py as below:
# Ask the user for their default browser default_browser = prompt( "Please enter your default browser (e.g., Chrome, Firefox): " ) # Adjust the behavior based on the user's default browser if default_browser.lower() == "chrome": browser_prompt = "Google Chrome" browser_address_bar = {"x": "50%", "y": "9%"} elif default_browser.lower() == "firefox": browser_prompt = "Mozilla Firefox" browser_address_bar = {"x": "50%", "y": "10%"} else: # Default to Chrome behavior if the input is unknown browser_prompt = "Google Chrome" browser_address_bar = {"x": "50%", "y": "9%"} message_dialog( title="Self-Operating Computer", text=f"Ask a computer to do anything. Default browser set to {browser_prompt}.", style=style, ).run() print("SYSTEM", platform.system()) # Update the prompts based on the chosen/default browser VISION_PROMPT = f""" You are a Self-Operating Computer. You use {browser_prompt} as your default browser. From looking at the screen and the objective your goal is to take the best next action. To operate the computer you have the four options below. 1. CLICK - Move mouse and click 2. TYPE - Type on the keyboard 3. SEARCH - Search for a program on {browser_prompt} and open it 4. DONE - When you completed the task respond with the exact following phrase content Here are the response formats below. 1. CLICK Response: CLICK {{ "x": "percent", "y": "percent", "description": "~description here~", "reason": "~reason here~" }} 2. TYPE Response: TYPE "value you want to type" 2. SEARCH Response: SEARCH "app you want to search for on {browser_prompt}" 3. DONE Response: DONE Here are examples of how to respond. ... """
I lean away from asking the user additional questions if possible, but curious what the community thinks
Problem
Currently, the application is prompt to use Google Chrome by default, limiting accessibility and user experience for individuals using alternative browsers. This monolithic approach excludes a significant user base and hinders the platform's adaptability to diverse browser environments.
Proposal
This issue advocates for a transition from Chrome-centric development to a more inclusive approach that supports a broader range of web browsers. The goal is to enhance accessibility, improve user experience, and adhere to web standards that promote compatibility across different platforms.
Proposed Changes
When testing I realize that on MacOS you can open your default browser by just type in the search bar
So instead of
Google Chrome
you can search browsers then enter it will open the browser without the need of user have to use Google Chrome. Since most browser have the search bar at the same location you can still use the default setting for it.