https://user-images.githubusercontent.com/7995105/126933240-b8176047-7cc4-4b22-91dc-aee7490476ed.mp4
Background
Thesis
Design
Architecture
Data Schema
Workflows
Document Storage
Shut up, how can I use it?
Notes
Future
Inspirations
Apollo is a different type of search engine. Traditional search engines (like Google) are great for discovery when you're trying to find the answer to a question, but you don't know what you're looking for.
However, they're very poor at recall and synthesis when you've seen something before on the internet somewhere but can't remember where. Trying to find it becomes a nightmare - how can you synthezize the great material on the internet when you forgot where it even was? I've wasted many an hour combing through Google and my search history to look up a good article, blog post, or just something I've seen before.
Even with built in systems to store some of my favorite articles, podcasts, and other stuff, I forget things all the time.
Screw finding a needle in the haystack. Let's create a new type of search to choose which gem you're looking for
Apollo is a search engine and web crawler to digest your digital footprint. What this means is that you choose what to put in it. When you come across something that looks interesting, be it an article, blog post, website, whatever, you manually add it (with built in systems to make doing so easy). If you always want to pull in data from a certain data source, like your notes or something else, you can do that too. This tackles one of the biggest problems of recall in search engines returning a lot of irrelevant information because with Apollo, the signal to noise ratio is very high. You've chosen exactly what to put in it.
Apollo is not necessarly built for raw discovery (although it certainly matches rediscovery), it's built for knowledge compression and transformation - that is looking up things that you've previously deemed to be cool
The first thing you might notice is that the design is reminiscent of the old digital computer age, back in the Unix days. This is intentional for many reasons. In addition to paying homage to the greats of the past, this design makes me feel like I'm searching through something that is authentically my own. When I search for stuff, I genuinely feel like I'm travelling through the past.
Apollo's client side is written in Poseidon. The client side interacts with the backend via a REST-like API which provides endpoints for searching data and adding a new entry.
The backend is written in Go and is composed of a couple of important components
Two schemas we use, one to first parse the data into some encoded format. This does not get stored, it's purely an intermediate before we transform it into a record for our inverted index. Why is this important?
type Data struct {
title string //a title of the record, self-explanatory
link string //links to the source of a record, e.g. a blog post, website, podcast etc.
content string //actual content of the record, must be text data
tags []string //list of potential high-level document tags you want to add that will be
//indexed in addition to the raw data contained
}
//smallest unit of data that we store in the database
//this will store each "item" in our search engine with all of the necessary information
//for the inverted index
type Record struct {
//unique identifier
ID string `json:"id"`
//title
Title string `json:"title"`
//potential link to the source if applicable
Link string `json:"link"`
//text content to display on results page
Content string `json:"content"`
//map of tokens to their frequency
TokenFrequency map[string]int `json:"tokenFrequency"`
}
Data comes in many forms and the more varied those forms are, the harder it's to write reliable software to deal with it. If everything I wanted to index was just stuff I wrote, life would be easy. All of my notes would probably live in one place, so I would just have to grab the data from that data source and chill
The problem is I don't take a lot of notes and not everything I want to index is something I'd take notes of.
So what to do?
Apollo can't handle all types of data, it's not designed to. However in building a search engine to index stuff, there are a couple of things I focused on:
pkg/apollo/sources
folder, following the same rules as some of the examples and make sure to add it in the GetData()
method of the source.go
file in this packageLocal records and data from data sources are stored in separate JSON files. This is for convenience.
I also personally store my Kindle highlights as a JSON file - I use read.amazon.com and a readwise extension to download the exported highlights for a book. I put any new book JSON files in a kindle folder in the outer directory and every time the inverted index is recomputed, the kindle file takes any new book highlights, integrate them into the main kindle.json
file stored in the data
folder, then delete the old file.
Although I built Apollo first and foremost for myself, I also wanted other people to be able to use if they found it valuable. To use Apollo locally
git clone ....
Go
installed and youtube-dl
which is how we download the subtitles of a video. You can use this to install it.cd apollo
.
Note since Apollo syncs from some personal data sources, you'll want to remove them, add your own, or build stuff on top of them. Otherwise the terminal wil complain if you attempt to run it, so:pkg/apollo/sources
in your preferred editor and replace the body of the GetData
function with return make(map[string]schema.Data)
data
in the outer directory.env
file in the outermost directory (i.e. in the same directory as the README.md
) and add PASSWORD=<val>
where <val>
is whatever password you want. This is necessary for adding or scraping the data, you'll want to "prove you're Amir" i.e. authenticate yourself and then you won't need to do this in the future. If this is not making sense, try adding some data on apollo.amirbolous.com/add
and see what happens. go run cmd/apollo.go
in the terminal.127.0.0.1:8993
on your browserAdd Data
sectionHooray!
popup then that means you were authenticated successfully. You only need to do this once since we use localStorage
to save whether you've been authenticated once or not.scrape
a website, you'll want to paste a link in the link textbox, then click on the button scrape
. Note this does not add the website/content - you still need to click the add
button if you want to save it. The web crawler works reliably most of the time if you're dealing with written content on a web page or a YouTube video. We use a Go ported version of readability to scrape the main contents from a page if it's written content and youtube-dl to get the transcript of a video. In the future, I'd like to make this web crawler more robust, but it works well enough most of the time for now.As a side note, although I want others to be able to use Apollo, this is not a "commercial product" so feel free to open a feature request if you'd like one but it's unlikely I will get to it unless it becomes something I personally want to use.
gob
package for the database/inverted index and JSON
. The gob
package is definitely faster however it's only native in Go so I decided to go with JSON
to make the data available in the future for potentially any non-Go integrations and be able to switch the infrastructure completely if I want to etc.