Noteworthy is a full stack, serverless, web-based note taking app, focusing on user-friendly features like voice notes, which utilize Speech-to-Text AI to enable more efficient and accurate note taking.
Available for use here: https://drh6zqq3rdeze.cloudfront.net/
There are a myriad of note taking apps on the market, each with their pros and cons. A list of common pain points:
Noteworthy aims to address these pain points with the following goals,
The motivation behind Voice Notes was to reduce the obvious barrier to note taking - typing. Several user groups can benefit, such as busy employees who do not have time to take notes between meetings, people with phsyical disabilities, and slower typers. From a product perspective, productivity apps necessitate strong engagement and retention metrics. Reducing the barrier to take notes using Voice should increase product metrics like
ultimately creating the core product loop: creating more note data <-> accessing more note data
See project board for upcoming features and known issues: https://github.com/users/citomcclure/projects/1/views/1
The frontend uses HTML, CSS, JavaScript, and Bootstrap.
Major components used by Bootstrap include grid layout, drop down menu for sort, and spinner animations for autosaving/deleting states, but otherwise the design was made through extensive use of CSS styling.
In order to maintain a single page application, API endpoints are optimized to reduce backend calls and maintain concurrent state in JS Datastore. Uses Axios API to make HTTPS requests to two REST endpoints:
/notes
with GET, POST, PUT, and DELETE HTTP methods/notes/voice
with POST HTTP methodIAM is handled by Amazon Cognito for user authentication. The web app is served through an Amazon CloudFront distribution.
The backend is written in Java and leverages a serverless application model (SAM) using Lambda, in conjunction with several other AWS services.
The entire application is configured using a CloudFormation template to deploy resources, manage access through policies, and other configurations. The template also informs API Gateway which endpoints and HTTP methods correspond with which Lambda. Once a Lambda is triggered, the same general flow of information is executed for all Lambdas:
There are two DynamoDB tables, with the following schema:
Other:
The Voice Note capability has a more complex end-to-end implementation.
On the frontend, the user's audio is captured using the browser's media device as a stream. Using a third party library (extendable-media-recorder + extendable-media-recorder-wav-encoder under the MIT license), a media recorder is set up using the stream and audio/wav MIME type. Although the default .webm
format could be used, the preferred format for Amazon Transcribe is WAV with PCM 16-bit encoding. Via a POST call to /notes/voice
, the WAV file is included as form data in a Content-Type: multipart/form-data
HTTP request.
On the backend, API Gateway Base64 encodes the request, which is parsed to remove the non-WAV elements that are prepended to the request body. The Request object is built using the user's email and the audio as an array of bytes. The business logic leverages wrapper classes for the Amazon S3 and Amazon Transcribe services to help abstract a lot of non-business logic out of the Activity class. Together they acheive the following:
Transcription
object and save to transcriptions
table in DDBNote
object) using transcript and save to notes
table in DDBNote: Because AWS SDK 1.x for Java is used across the project, there were several limitations such as not being able to stream transcription results (instead of batch). Planned optimizations and spike tickets for improving the voice note feature can be found on the project board (e.g., using presigned URLs): https://github.com/users/citomcclure/projects/1
Original designs can be found here: https://miro.com/app/board/uXjVKGfpUwM=/?share_link_id=519475842006