Open willperez23 opened 2 years ago
This application is the leading voice to code product on the market. It is targeted at software developers and you can code in almost all common languages and on all editors. This is a no type product where you must state a bunch of key words to indicate what language you are coding in etc.
An AI system that translates natural language to code. This product would be important in translating our speach to code if we wanted to make our speech more "conversational." Currently, there seems to be a waitlist to access this, but I have also seen similar tools on more recent OpenAi releases.
Talon is a complete handsfree approach to coding. It involves:
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. I would be interested in using whisper to convert the speech to text. Whisper has uncanny accuracy at all speeds and accents. This would make whisper a good choice when we are collecting voice data from a computer microphone.
This beta example shows how OpenAi models can generate docstrings given a sample of Python code. I think this could be a nice feature to include in convocode to make code pretty and standardized.
There is a lot of data on the internet about software engineer burnout and RSIs from typing. Thus, the market is strong for more typing alternative products.
This beta example shows how OpenAi models can create a chatbot to aid in javascript programming. Given that we what our application to be conversational, this chatbot (if applicable in python) could be an interesting addition. Example:
You: How do I combine arrays?
JavaScript chatbot: You can use the concat() method.
You: How do you make an alert appear after 10 seconds?
JavaScript chatbot: You can use the setTimeout() method.
VoiceCode is a voice command software used for voice to code translation, amongst other capabilities. It began as a form of reducing keyboard use. Its system is composed of six parts: Dragon Dictate, a Comprehensive Command Set, VoiceCode Language Processor, a Command Execution Framework, a Custom Phonetic Alphabet, and SmartNav. Dragon Dictate and SmartNav must be purchased separately to use this product.
Speech2Code is an application transforming natural language to code which can then be applied through a VS Code Extension, Spoken. It only supports one programming language, JavaScript, and two IDEs, VS Code and CodeMirror. It utilizes Microsoft's Azure Speech to Text for the voice to text translation.
This application turns voice commands into functional code. It first converts natural language into text then that text into Python or JavaScript code blocks. It then runs terminal tests to check it. It implements OpenAI's Codex, an AI system translating natural language to code and can also import other programs such as graphics processing. CodeVox is an example of how to create a tool which supplements programmers by completing baseline coding tasks.
OpenAI - Python to Natural Language
This is an example of the applications of OpenAI. In this example, it is used for converting Python code into an understandable natural language description of it. This could help with the documentation aspect of our project.
Using OpenAI to create a Python bug fixer. Our project would use a similar functionality for syntax error recognition and could possibly detect these simple errors in real-time.
Programming by voice, Vocal Programming
This paper discusses the importance of creating a syntax-based voice recognition system to increase the efficiency of a voice to code program. It begins by introducing the relevance of Repetitive Motion Injury, including Carpal Tunnel Syndrome, in relation to programmers. It further discusses adapting existing voice recognition technology, Dragon NaturallySpeaking, and finding how to vocalize programming languages in a standard way.
This is an AI system based on codex which is trained on natural language processing data. It can autocomplete the next line of code based on coding input, and rewrite code to make it more efficient. It can also describe what a function is doing, which could be helpful if we implement the automated documentation feature. The above image shows the AI completing code for a given set of instructions. Because we are trying to reduce the repetitive nature of coding, instructions like these could be helpful to speed up the coding process for the most used expressions and loops. This functionality can also rename functions, add documentation, and optimize the code.
One of our features that our AI assistant will easily spit out runtimes of certain programs in order to facilitate optimization. It is great to see that this is an easily accessible endpoint in OpenAI. As we can see from this documentation, the only required information for this is a string of the function. During our team meeting, we discussed being able to highlight blocks of code and request runtime analysis to better understand what code is doing. This is a perfect method with which to do this, as we can supply only portions of the code to the API.
AssemblyAI API Docs Detect Important Phrases and Words Endpoint
Annie and I were able to use this API for speech to text in our Hack-a-Thing 1. The benefits of this API over OpenAI might be that this is a specific speech to text API, and there are endpoints that allow for real time transcription of speech. An important part of our project is being able to see code being typed as you speak it or if this is not possible having fast access to visualized code. There are many useful endpoints here, including the Auto Chapters endpoint that provides summaries of what is spoken. That can be useful for the chatbot, which needs to be able to process natural language and not just format and understand code, which OpenAI can do. Additionally, AssemblyAI has an endpoint that detects important words and phrases. Since we want to automate often repeated coding processes, this API endpoint could also be useful to track what is most often used by a certain developer. The image above shows the JSON response for a summarization of text. Also, I have included an image of sentiment analysis, which might be useful for our chatbot.
One of the expiriences we want to improve for coders is feeling isolated during the coding expirience, or feeling the need to switch between many different screens during the course of a coding session. Something we brainstormed was showing coders that they are not alone when they have a bug. Stack Overflow is a common source for coders. In looking at the documentation, we can see that we can search posts by IDs when an error is reached in the code. There are search endpoints that will allow us to pass the error message directly to the API call and get the posts that might be helpful for the user on the spot, without them having to look it up themselves.
This is Google's chatbot API that is powered by their AI. It uses BERT NLP for language recognition, which some of us on the team are familiar with from CS72.The above image shows the widget framework that we could potentially use. It outlines a Business-to-consumer conversation (B2C), which isn't exactly our use case, but I think it may still work as we would like to ask our coder their needs, which is exactly what businesses ask of their customers in these interactions. This may be helpful as it would prevent us from having to develop our own learning algorithms and take advantage of Google's vast array of information.
This article provides some important background information on Whisper, an automatic speech recognition technology. We've discussed using it for our project and here we can find information on what kinds of speech it works best on and its limitations. Voice to text is the foundation of our project–we're simply taking it another step with text to code–so I think it's important we know some background on how voice to text works so that we can stand behind our product.
The main focus of this research paper is an exploration of the development of a vocal programming environment. It explores two existing technologies that would be competitors to our project–Serenade and Talon. Through extensive testing this paper determined Serenade to be superior, requiring an average of 45% fewer words to produce the same code. We can use the information in this article to help us make design decisions and how we can further improve voice to code technology.
This article describes VoiceCode, which would be a competitor to our product. VoiceCode is aimed specifically toward addressing Repetitive Stress Injuries and disabilities, which was the initial target of Abby and I's idea. Looking at VoiceCode can be helpful for deciding whether we want to continue with that focus in mind or shift gears toward a more educational lens.
Study of Talon extension prototype
The prototype in this article was an extension of Talon, a competitor to our project as noted in other team member's comments. The prototype was created as a study in how modern software development tools can be integrated with currently existing voice to code systems to improve them. What I consider most important to our project, however, is page 61. Here you will find a discussion of efficiency and accuracy improvement with voice to code. Since we are considering marketing our product as a speed-coding tool, this will be important information to review.
This article compares Kaldi and the Google Cloud Speech API. Which speech to text technology we use will be incredibly important for our project as we want a little frustration as possible, which will require high accuracy in voice to text. This article notes the pros and cons of Kaldi and Google Cloud Speech API as well as their accuracy in a fairly thorough study. We might also consider building a product where users have the option to choose between various speech to text APIs or technologies.
Pyodide is a package that allows users to run Python code within a browser and through react. This could be a great way for us to run our python files and check the results of our voice to coide for syntax
Whisper is a speech to text application trained on data and recently released as open source. It can be used to take use text and convert to speech so we can then take that result and turn it into code
Serenade is a company that currently exists to convernt speech to code and would exist as a direct competitor to our product. While it has the capability to handle various languages videos have revealed to me it is more based on editing previously written code instead of developing a file from scratch
Our solution may want tutorial videos so we can show users how to use our product. Chances are our website will be hosted on react so a resource to add videos to a react website could be valuable
Github copilot is a recent release by Github that uses AI to support coders by suggesting code and developing entire functions. Our solution is looking to make coding easier for our users so this is an obvious competitor that has had early success. I wonder if there might be a way to integrate our solution with github copilot where users can make requests, looks at the result, and determine if they want to keep it or not
State Of The Art Research
Voice to Code
Serenade AI | [market|competitive]
serenade.ai
This would be a direct competitor to our product. However, since it is open source, we may be able to integrate some of this technology directly into our project.
Dragon | [method|background]
Dragon by Nuance
Dragon is an advanced engine and is widely used for programming by voice. AI‑powered speech recognition enables high‑quality transcription in 1/3 the time. Not necessarily used for code, but a source for voice to text.
Talon Voice | [competitive]
Talon Voice
Software that uses voice, sound, and eye tracking for computer use. Competitor using more advanced technology than our initial idea; however, an interesting approach to everyday computer interactions.
Caster | [method|competitive|market|background]
Caster
Software running on Linux that allows control over applications, games, mouse and keyboard. Expanding functionality to iOS systems could be a differencing factor. This is similar to Voice Attack, both use it for gaming more than coding.
Voice Attack | [competitive]
Voice Attack
Voice-activated software for controlling PC games and apps. Expanding functionality to iOS systems could be a differencing factor. The voice aspect of this software could be considered a helpful factor in our project.