OthersideAI / self-operating-computer

A framework to enable multimodal models to operate a computer.
https://www.hyperwriteai.com/self-operating-computer
MIT License
8.92k stars 1.2k forks source link

idea #66

Closed alafortu closed 12 months ago

alafortu commented 1 year ago

I played with gpt4V on other projects and it definitely has a hard time figuring out coordinates. I used other model trained on image identification to find the coordinates of the box made around the object detected and then I can pass it to gpt 4 to perform an action. For your use case, I juste tested this model "https://huggingface.co/foduucom/web-form-ui-field-detection" Far from being perfect, but maybe an idea to build on. If you auto computer can detect and get the proper coordinates of the input fields in an image, it could help or at least add a level of redundancy to improve accuracy in clicking and inputing stuff at the right places.

Bunger-Beesechurger commented 1 year ago

@rohanarun I'm not a contributor to this github, just part of the audience usually, but this seems earlier than your video. Early August is when this article came out, so it's been in the works even earlier than that. Stop spamming every issue. You said you've been working on your thing for over a year, but how much of the info came out before your video? I don't know whether it's plagiarizing or not, and if it is, I'm sorry. However, I can still be annoyed that on what should be a cool new project for tech advancement, we have to figure out if something is stealing or not.

Screenshot (813)

Article says "HyperWriteAI" and from this github's own main page: "Ongoing Development At HyperwriteAI, we are developing Agent-1-Vision a multimodal model with more accurate click location predictions" so it is referencing this project.

Kreijstal commented 12 months ago

I mean you are saying you have a custom model, but all I see it's propietary and business products, your custom model is handwritten for the cases, but this is gpt-4V so it's not a rip off, they just had the idea (wouldn't it be cool if gpt-4 could control computers) and open sourced it first 🤷. It can't be a rip off because you started without gpt-4v, you trained a propietary custom model, these guys just did prompt engineering and got it wit gpt-4v to work, without taking any custom models.

If these guys get more fame it's because they open sourced it first, and then it's first come first serve. I think it's fair. imho.

Also your insecurity is showing, if your product was really good there is no need to spam it on every issue. Just give us something better and people will naturally flock to it.

James4Ever0 commented 12 months ago

Keep posting these will not help. AGI is for everyone, truely democratic. It has been a long time that not a single company has wielded the wand towards the field of autonomous computers, until now. I have been waiting for this very moment for so long. It must be open source, and it will change the human history for good.

James4Ever0 commented 12 months ago

For inspiration, please check #37 #32

michaelhhogue commented 12 months ago

@alafortu Thanks for the suggestion. Low accuracy with GPT-4v is a known issue at the moment, and support for other models is planned in the future.