canvas-ai / canvas-server

Canvas Core
https://getcanvas.org
GNU Affero General Public License v3.0
0 stars 1 forks source link

€200 | Implement propper listDocuments() and getDocuments() methods #6

Closed idncsk closed 2 months ago

idncsk commented 5 months ago

We use a base document class and a concept of data "abstractions" that extend the based document object to accommodate various document fields depending on the data abstraction(browser tab, file, note or email all have different data we might want to index)

Here a simplified example of a browser tab

{
  "id": 123
  "type": "data/abstraction/tab"
  "createdTimestamp": ""
  "updatedTimestamp": ""
  "meta": {
    // Metadata fields
  },
  "data": {
    // Data fields
  }
  "versions" : {
    // List of versions (docIDs)
  }
}

listDocuments() should only return the objects metadata while getDocuments() the whole object(or an array of objects) This is esp. helpful for data abstractions where data is stored directly in the database (notes, todo items, browser tabs).

Additionally, we need to decide which fields to index and what type of index should be used on a given field of a data abstraction (we might want to create a bitmap index for the domain or owner or lets say the emailFrom field, a hashmap for the url or use a embedding vector/separate FTS index to represent a content of a note)

The above translates to the following idea(comments/suggestions welcome!):

{
  "index": {
    "bitmap": ["meta.domain", "meta.browserName"],
    "hashmap": ["data.url"] // data used to calculate the primary document checksum
  },
  "meta": {
    // Metadata fields
  },
  "data": {
    // Data fields
  }
  "versions" { /* .. */ }
}
idncsk commented 4 months ago

Already implemented