AvaloniaUI / avalonia-docs

https://docs.avaloniaui.net/docs/welcome
63 stars 209 forks source link

add Generalized command /query UIi discovery completion .. run OS and all apps via automation qen fast UI indexed and discovery, add synonym, eventually hand's free even no look ouse of apps in a consistent ergonomic way. .. oor ill pitch it to MSFT, pico voice it should be the same implementation on all oss.. #279

Closed damian-666 closed 9 months ago

damian-666 commented 11 months ago

for scaping outside of windows MSFT has stalled on the effort at east partially working=could be ok..

you might look ar justrsayit.iLf or linux DOM command scrapers for completion and silk...

context awareness is whats bungled in voice assess in windows.. https://learn.microsoft.com/en-us/dotnet/api/system.speech.recognition.grammar?view=dotnet-plat-ext-8.0 trhisd would be where to put the menu or small domain context.

. i would love to see generalized command control and quey and completion, hand s free w quality dynamic cardioid mic, rhe apps logo /name you call it, and indexed commands UI, fo discoverability, maybe with picovoice

which has basic hot word demo in Avalonia. . one could index all th UI like visual studios command code indexer disciscovery, completion, AI difirst.. but no Bing sand no copilots and low impact. there are 3 completing copilots in windows now.

and use context awareness to refine the voice commands guess. .. this is an important step, to limit the domain via context.,... mayve silk is worth a look too..

but the implementation via testing or automation doesn't need to go through ui.. it csn if must.. even use ocr if its can spy on the current window like wpf, visual tree.

an semantic kernel those all apps and all os really should be one implementation. looks like alexa inspired. "skills" is a bit much when i can't even drive the app. hands free. no need to make a pluging for every app and they are all basically comand query. of the COM.. all the same. it could be generalized on MacOS, linxus as well.. but its hard scrape ui should be consistent in an os..

.. the cdata can use context from selected window ocr index the ui .. two way coms. hotwords.. " hey Ubuntu, change to the root src folder." "what's my biggest file?", dir ( scape params types via c externs... silk scrapes types and params this for all public opengl apis.. im proposing UI with one mic, one line , a start menu for window

Start

Screenshot 2023-10-24 215256

( on the LEFT), and i used start is back.. os and apps, that is one implementation that works on windows , linux, and osx. LLM plus fast completion in context using low latency, voice assess is ok but has not command context awareness....it could chatgpt sad give me its guess, then the 5 chose menu ill fix the guess...

honestly i think MSFT shol ddo this or some 3rd party . it unified UI and i dones see how i could be any easier.. leave the bloated visual uii as is then go right to the model later.. i can make a formal proposal to start is back,.. to msft, to pico voice... i thst s full star trek interactions..

any opinions.? could make a sample for avalonia and i can pitch it to microsoft again byt they have more interest in engagement than productivity.. or are really stupid.. it in some way could have a generalized module.. even work on macos and linux , justssayit scrapes windows.... i coul d ask pico aslo..

i just don't see a more intuitive way..but im all ears if there isn't full consensus im wont fight for it.. its to prevent disability and get things done.. sitting i just lost use of left hand.. another blown disk.. 4 back surgeries..20 hours a day sa pc fighting for ways to save time and get my life back 30 years of typing w mice is enough

demo suggest by chapbot

Scene Setup User Environment: Show a user in a typical computing environment, like a home office. Computer with Windows OS: The user's computer is running Windows with the new voice-controlled UI feature. Demonstration Flow Voice Command to Open Applications: The user says, "Launch Notepad," and the system responds by opening Notepad. Interaction for Specific Tasks: The user commands the system to perform tasks like "Show recent documents" or "Send an email to John." Confirmation and Undo Features: When a potentially irreversible action is commanded, the system asks for confirmation ("Are you sure you want to send?"). An audible beep indicates the system is waiting for a response, and the user has the option to undo the action. Continuous Interaction: The user converses with the AI seamlessly, giving commands and receiving auditory feedback without needing to look at the screen. Key Elements Visual Cues: Use visual effects to show the AI's responsiveness (e.g., the microphone icon lighting up). Auditory Feedback: Include sounds for confirmation, waiting for a response, and completion of tasks. Functionality over Flashiness: Focus on the practical and efficient aspects of the UI rather than flashy graphics. Narrative: Weave a narrative that shows the UI’s utility in saving time and improving workflow. Production Storyboarding: Start with detailed storyboarding to plan out each scene. Animation Software: Use animation software capable of creating realistic scenarios, such as Adobe After Effects or Blender. Voice Actors: Employ voice actors for the user and the AI's responses to add realism. Sound Design: Include subtle sound effects for user interactions and system responses. Conclusion Emphasis on Efficiency: Conclude with a scene that emphasizes the increased productivity and ease of use provided by the new UI. Call to Action: End with a message about the future of computing and how voice-controlled interfaces can revolutionize our interaction with technology.

timunie commented 11 months ago

@damian-666 any chance you can improve this ticket to be more readable? It's hard for me to follow what you want to propose here.

damian-666 commented 9 months ago

Closing this, I'm still working on a proposal , but it's to Microsoft through the insider program.

or for a research grant for anyone interested, as i don't have the time or ability to manage or do all the work, I just want the results so i can use my computer without suffering.

I wrote this after i blew a disk in my neck , and my left arm was numb.

The short story is that the legacy accessibly work say for Linux, isn't worth the effort IMO, due to rapid advances in ChatGPT4 multimodal, daily updates to Semantic Kernel agent chains , voice to intent, and new local small language models like Phi.

It should not fall on every app developer in .Net or other to do all this incredible amount of redundant work on a very poorly designed second rate feature to pay lip service to a protected class of people with a purchase power that really noone cares about except neurosurgeons

I had an idea then that Avalonia could build it as a "show them how its done feature vs Voice Access"
because tooltip and UI discovery in WPF is done via the tree.. and after I saw how horrifically redundant if is with automation and COM over c#. +UI tree parsing and spying , and differing in Linux and OSx..

so my current thinking is to use OCR, Netcore EXE deployed as a local service maybe with a special mic if they don't have a dynamic mic and preamp, and have it work on legacy apps and new apps without any developer work, except maybes for code completion in the case of Avalonia Edit .

TLDR Also on the Start menu Would generalized with code feature search and complete, active in the apps, with a general implementation.

Would be likley a trained LLM + RAG ( Retrieval Augmented Generation) to give context like whats on the current open dialogs box or File menu, or finetuning prompts.

for agent chains via an online LLM, in admin mode, will discover and index al the UI and tooltips, via OCR.

For the normal user, the staggering unmanageable bloat and completive of a modern IDE and OS and the lack of standards, its shamefully abusive and either driven by product designers who can't code, or deliberately left a mess for cloud dollars, while you are engaged, or Bing, or having to buy more hardware . it cause disability injury wasted years, and bloodclots and death.

But that means a complete redesign of voice access. the dictation is great but there is no proper use of context, the command are build over the mouse and the dictation is unaware of focus context and The idea is backed by some research done a few years 

There are several companies building hardware that sort of does this. All the efforts like pico voice and such are not quite good enough. For me ill voice access is amazing with a dynamic mic and unable with a the mic array.