fbgallet / roam-extension-speech-to-roam

MIT License
8 stars 0 forks source link

Live AI Assistant (former Speech-to-Roam)

Voice transcription and AI Assistant supporting text, voice or images as prompt. Easy-to-define context and templated post-processing for response structured exactly as you want. Support up-to-date GPT or Claude models, and most of existing models through OpenRouter and local models through Ollama.

🆕 New in v.9:

🆕 New in v.8:

(See changelog here)

Live AI Demo 3

Controls including vocal transcription:

Formes 24 avril 17h45

GETTING STARTED

Provide API Keys

NB: API fees should not be confused with the ChatGPT Plus subscription; they are strictly separate. You do not need the ChatGPT plus subscription to use Live AI Assistant.

Your first prompt to Live AI Assistant

Just press the microphone button and provide vocal instructions, or place the cursor focus in a block where you have written your prompt, then click on the AI completion button (OpenAI Logo). That's all !

You can easily use structured prompts by selecting multiple blocks (including images with models supporting image recognition). Create your own Roam template to have a set of ready-to-use advanced prompts !

You can easily add context to your prompt: by pressing Shift while clicking on the AI completion button, all the content in the sidebar will be sent as context (for example, ask to resume some content provided in the context). See 'AI Assistant' section below for more details and possibilities about context.

You can easily compare AI models responses: right click on 'Generate a response again' button appearing on the right of the AI response and choose another model. The new response will be inserted just above the first one.

Chat with your AI Assistant

You can easily continue any conversation with an AI Assistant:

Live AI chat demo 2

Previous messages (including children blocks) will be automatically taken into account. About the context:

Keyboard hotkeys (⚠️ available only when the voice recording has been started by a mouse click):

Commands (in command palette - I recommand to set up hotkeys for them)

Trigger controls concerning vocal notes:

Keyboard-only (no vocal) interactions with the AI assistantAI features and other commands:

A SmartBlock command is also provided: <%SPEECHTOROAM%>, see the example of SmartBlock at the end of this doc.

DETAILED INSTRUCTIONS

Voice transcription

⚠️ Currently, voice recording isn't possible on either the MacOS desktop app or the Mobile app : microphone is not yet supported, so vocal notes transcription can't be achieved. But all commands relying only on text (like AI completion or post-processing) are available. The extensions works properly on all browsers (desktop and mobile, MacOs, iOS, Windows or Android) and on Windows desktop app.

Translation

A large number of source languages are supported, but the target language is currently limited to English. This limitation can be easily overcome through post-processing using a GPT model, as it only requires asking it to translate into almost any language.

AI Assistant (OpenAI GPT models, Anthropic Claude models and other models throught OpenRouter or Ollama server)

AI Post-processing of vocal notes following your templates

Use models throught OpenRouter

OpenRouter is an unified API routing requests to wide range of models. The benefit is having a single account to access to most of existing and up-to-date models. You pay as you go: after purchasing credit (you can test without credit), your credit is debited on each request. OpenRouter also offers a continuously updated ranking of the most popular models.

In the settings, provide the list of IDs of the models you want to use in LiveAI. They will appear in the context menu in a dedicated section or replace the native models if you check the corresponding option. The first model in your list can be selected as your default model.

By default, logging of your inputs & outputs in OpenRouter's settings is enabled, you can disable it from your OpenRouter account.

Use Ollama to run local models

Ollama allows you to run local models like Llama3, so all your data shared with the AI assistant is processed entirely locally and is not sent to a third party like OpenAI or Anthropic. (Please note: a local model is typically slower than a remote model and requires a machine with a lot of RAM. E.g a 7B model may require 7GB of RAM to work properly) Install Ollama, install a model (ex. ollama run llama3), add the model name in the settings above (e.g. llama3), and follow the instructions below:

To use Ollama in Roam, you have also to set OLLAMA_ORIGINS environment variable to https://roamresearch.com (by default, Ollama CORS is restricted to local origins). See Ollama documentation here or proceed this way, according to your operating system:

on MacOS

⚠️ In my experience, MacOS Ollama.app doesn't take into account OLLAMA_ORIGINS variable change. After Ollama installation, Ollapa.app will be loaded in the background. You need to close it (using, e.g., the activity monitor), then launch "ollama serve" from the terminal. It may also be necessary to disable the automatic startup of Ollama.app when your OS starts by going to System Preferences > General > Startup > Open at login: select Ollama.app and click on the minus sign (-).

on Windows

on Linux

Keyboard & text only AI completion and post-processing

You can also use AI assistant feature without vocal note, just using text content of some blocks in your graph and the dedicated command in command palette (see above).

Using the SmartBlock command

You can insert <%SPEECHTOROAM%> command in your SmartBlocks template (using the corresponding extension) to start recording a vocal note in a specific context. You can for example create a very simple SmartBlock and call it with a button:

- #SmartBlock Speech-to-Roam
    - <%SPEECHTOROAM%><%CURSOR%>

The SmartBlock button will be {{🎙️:SmartBlock:Speech-to-Roam}} (can be used once), or to have a permanent button in a given block, and automatically insert the transcription in the children blocks: {{🎙️:SmartBlock:Speech-to-Roam:RemoveButton=false}}

API usage fees

Moderate but regular use should only cost a few tens of cents per month (costs may increase if you use GPT-4 (default is GPT-3.5), think to set a maximum monthly limit). You can check the detailed daily cost of your usage of Whisper and other OpenAI models here, update is almost instantaneous.

OpenAI Whisper API pricing:

To give you an idea, using Whisper for 10 minutes a day for a month equates to 1.80 $

OpenAI GPT API pricing:

The prices are for 1000 tokens. For comparison, this documentation is equivalent to about 3500 tokens (2000 words).

See updated OpenAI API pricing here.

Claude API pricing:

See updated Anthropic Claude API pricing here.

Support my work

This extension represents a significant amount of work. If you want to encourage me to develop further and enhance it, you can buy me a coffee ☕ here. Thanks in advance for your support! 🙏


For any question or suggestion, DM me on Twitter and follow me to be informed of updates and new extensions : @fbgallet.

Please report any issue here.