Open janaka opened 9 months ago
Partially implementing as part of #207 as this involves org-scoped data. Partial because migrating existing data structure to the new in deployed systems will not be handled. A new DataScope enum has been introduced with backwards-compatible mappings where needed.
Situation
Data persistence on disk isn't consistently separated by scope of ownership.
Current filesystem structure:
/index/PERSONAL/{user_id}/
- index files for Ask Your Docs feature/index/SHARED/{org_id}/{space_id}/
- index files for Spaces/sqlite/PERSONAL/{user_id)/usage.db
- retrieval and LLM request and response data (chat history etc) for all interactions/sqlite/SHARED/system.db
- system data and metadata (orgs, users, user_groups, spaces, and space_groups)/upload/PERSONAL/{user_id}/
- Ask Your Docs feature is hard coded to MANUAL_UPLOAD document. Those files are persisted here./upload/SHARED/{org_id}/{space_id}
- file uploads for any spaces with datasource = MANUAL_UPLOAD are persisted here.Database table to file mapping:
usage.db
:settings
(user scoped),history_{feature_name}
,history_thread_{feature_name}
system.db
:orgs
,org_members
,users
,settings
(none user scoped),space_groups
,space_group_members
,spaces
,space_access
,user_groups
,user_group_memebers
Tables with joins:
orgs
<>org_members
org_members
<>users
spaces
<>space_access
<>users
spaces
<>space_group_members
users
<>user_group_members
This structure isn't ideal with the addition of Orgs and given upcoming features such as public chatbots and changing the Ask You Docs functionality to be structured as a personal org.
Goals and Requirements
Proposal
Structure folders based on a name for the persistence system followed by one or more keys that identify the unique owner of the data. The filename describes the data.
Pattern:
/{persistance_system_name}/{owner_scope_key_1}/../{owner_scope_key_n}/{filename}
Concrete changes:
/index/SHARED/{org_id}/{space_id}/
-->/index/orgs/{org_id}/{space_id}/
[x]
/index/PERSONAL/{user_id}/
--> Same as above. Ask Your Docs changes to be achieved by providing every user a personal org. A space that isn't shared with any other users is private./index/THREAD/{org_id}/{space_id}
/ -->/index/personal/{user_id}/{space_id}
/sqlite/PERSONAL/{user_id)/usage.db
-->/sqlite/personal/{user_id)/usage.db
- authenticated user usage/sqlite/SHARED/system.db
-->/sqlite/global/system.db
- global systemnew -->
/sqlite/orgs/{org_id}/system.db
- org systemsettings
org and user scope - both (?) should be stored in the samesettings
table in/sqlite/orgs/{org_id}/system.db
settings
global scope -/sqlite/global/system.db
/upload/SHARED/{org_id}/{space_id}
-->/upload/org/{org_id}/{space_id}
/upload/PERSONAL/{user_id}/
--> same as above because of the changes to Ask Your Docs./upload/THREAD/{org_id}/space_id}
--> /upload/personal/{user_id}/{space_id}New use cases:
/sqlite/personal/anon-{user_id}/usage.db
- anonymous user usage. user_id is a generated guid./sqlite/orgs/{org_id}/experiments/system.db