Open foxoffire33 opened 4 months ago
The recent changes introduce several new SQL scripts, configuration files, and TypeScript updates to enhance the database operations for a project. These updates include creating views and indexes for efficient searches, adding dependencies, configuring data loading from SQLite to PostgreSQL, and defining database-related constants. Additionally, new SQL queries for data retrieval, deletion, insertion, and updates are added, along with TypeScript functions and types for handling Postgres database operations.
Files/Paths | Change Summary |
---|---|
migrations/...createViewForWildcardSearch.sql |
Added SQL migration script to create view transactionsTagsDecoded and indexes on tag_names and tag_values . |
migrations/down/...createViewForWildcardSearch.sql |
Added down migration commands to drop view transactionsTagsDecoded and indexes tagNamesIndex and tagValuesIndex . |
package.json |
Added @types/pg and @types/yesql dependencies, updated pg dependency. |
pgloader.conf |
Introduced configuration for loading data from SQLite to PostgreSQL. |
src/config.ts |
Added database-related constants using environment variables. |
src/database/pgql/bundles/... |
Introduced multiple SQL files with queries for various operations like data retrieval, deletion, and insertion. |
src/database/pgql/core/... |
Added SQL files with queries for core database operations, including accessors, imports, and stats. |
src/database/pgql/data/... |
Added SQL queries for handling content attributes and nested content. |
src/database/pgql/moderation/... |
Introduced SQL queries for moderation checks and blocking data. |
src/database/postgress/PostgressDatabaseHelpers.ts |
Added functions for handling transactions and data items in Postgres. |
src/database/postgress/PostgressDatabaseTypes.ts |
Introduced types and interfaces for worker pools, debug information, and database operations. |
src/database/postgress/StandalonePostgresDatabase.ts |
Added StandalonePostgresDatabase class for various database operations, including worker management and data handling. |
src/database/sql/core/accessors.sql |
Added FOR SHARE clause to a subquery, affecting locking behavior in SQL queries. |
src/database/standalone-postgres.ts |
Implemented Postgres database worker for various operations like fetching data, saving transactions, and handling errors. |
sequenceDiagram
participant User
participant API
participant Database
User->>API: Request to insert data
API->>Database: Execute upsertNewDataItem SQL query
Database-->>API: Data inserted/updated
API-->>User: Response with success/failure
sequenceDiagram
participant User
participant API
participant Database
User->>API: Request to fetch data attributes
API->>Database: Execute selectDataAttributes SQL query
Database-->>API: Return data attributes
API-->>User: Response with data attributes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Hey @foxoffire33, cool work! I'm happy to talk through issues with this. I'm not sure whether we'll want to integrate it into develop
/main
in the near term, but we could maintain a separate branch for those who want to play around with it. That way we could continue to iterate on development and get some feedback on the level of interest from the community.
I'll follow up later today or tomorrow with more thoughts about the approach and other potential fuzzy search options. Thanks again for all the work you put into this!
@foxoffire33 I've been thinking about this more today. What I'm initially curious about is what kind of fuzzy search you want to provide and which specific Postgres features you intend to use for it. (apologies if this is visible in the diffs. I probably won't be able to review those in detail till next week) I'm also curious if you considered using the webhook support to populate some external index (e.g., OpenSearch or similar), and if not, why.
Regarding potential integration - how much effort are you willing to put into maintaining this? We would like to add Postgres support in develop
/main
at some point, but envisioned using a somewhat different schema from SQLite when we eventually implemented it. If you're interested in pursuing that, I could give you some info about the schema design we had in mind.
I tried to do it with Meilisearch first, but it doesn't fit the case perfectly. Implementing the functionality I need would be similar in cost to changing the gateway itself.
Since this is for a project I'm working on, I'm happy to maintain a Postgres solution for fuzzy search as long as we need it.
I'm also interested in the database design you already have in place, and I might be able to contribute some code for that.
I tried to do it with Meilisearch first, but it doesn't fit the case perfectly. Implementing the functionality I need would be similar in cost to changing the gateway itself.
Gotcha, just wanted to make sure you were aware of the webhook option.
Since this is for a project I'm working on, I'm happy to maintain a Postgres solution for fuzzy search as long as we need it.
Great! What's the current status of the PR? Is it working for your use case now or are you still sorting through issues?
I'm also interested in the database design you already have in place, and I might be able to contribute some code for that.
Also great! I don't think I'll get to it till next week at this point, but I'll write up some details soon.
I'm still working on it. Do not have a lot of time.
There is an issue with sorting in queries. I will provide an update when I know more.
Long overdue update from our side - we're working on a version of the schema I'd recommend (on an internal legacy database in a different code base). It's similar to the SQLite schema but uses a single height partitioned table in Postgres as opposed to separate 'new' and 'stable' tables. We represent null
heights with a very large positive integer since partition keys can't be nullable. The advantage of this approach is that all the new/stable flushing is automatically handled by Postgres.
is it possible to share the sql schema?
Sure. This should give you the general idea, but keep in mind this is from a different DB with some legacy decisions (strings instead of binary fields). It's also under active development, so some details might still change.
CREATE TABLE data_items (
height integer,
height_partition_key integer,
id character(43) NOT NULL,
raw_offset bigint,
raw_size bigint,
signature character varying, -- only here to support ANS102
signature_type bigint,
signature_offset bigint,
signature_size bigint,
owner_address character(43),
owner_offset bigint,
owner_size bigint,
data_offset bigint,
data_size bigint,
target character(43),
quantity character varying,
reward character varying,
anchor character varying,
content_type character varying,
format smallint,
created_at timestamp without time zone DEFAULT now() NOT NULL,
updated_at timestamp without time zone DEFAULT now() NOT NULL,
parent character(43),
parent_root_tx_offset bigint, -- offset of the parent relative to the root TX
root_tx_id character(43), -- same as parent when it's not a nested data-item
PRIMARY KEY (height_partition_key, id)
) PARTITION BY RANGE (height_partition_key);
CREATE INDEX data_item_id_idx ON data_items (id);
CREATE INDEX data_item_height_part_key_desc_id_idx ON data_items (height_partition_key DESC, id);
CREATE INDEX data_item_owner_address_height_part_key_id_idx ON data_items (owner_address, height_partition_key, id);
CREATE INDEX data_item_owner_address_height_part_key_desc_id_idx ON data_items (owner_address, height_partition_key DESC, id);
CREATE INDEX data_item_target_height_part_key_id_idx ON data_items (target, height_partition_key, id);
CREATE INDEX data_item_target_height_part_key_desc_id_idx ON data_items (target, height_partition_key DESC, id);
CREATE TABLE data_item_tags (
height integer,
height_partition_key integer,
tx_id character(43) NOT NULL,
index integer NOT NULL,
name_hash bytea NOT NULL,
value_hash bytea NOT NULL,
created_at timestamp without time zone DEFAULT now() NOT NULL,
updated_at timestamp without time zone DEFAULT now() NOT NULL,
PRIMARY KEY (height_partition_key, tx_id, index)
) PARTITION BY RANGE (height_partition_key);
CREATE INDEX data_item_tags_name_value_height_part_tx_id_idx ON data_item_tags (name_hash, value_hash, height_partition_key, tx_id);
CREATE INDEX data_item_tags_name_value_height_part_key_desc_tx_id_idx ON data_item_tags (name_hash, value_hash, height_partition_key DESC, tx_id);
CREATE TABLE data_item_tag_names (
name_hash bytea NOT NULL PRIMARY KEY,
name character varying NOT NULL
);
CREATE INDEX data_item_tag_names_name_idx ON data_item_tag_names (name);
CREATE TABLE data_item_tag_values (
value_hash bytea NOT NULL PRIMARY KEY,
value character varying NOT NULL
);
CREATE INDEX data_item_tag_values_value_idx ON data_item_tag_values (value);
As part of the project I’m working on, I need to implement fuzzy search functionality. For example, if a user searches for an artist and makes a typo, I still want to return the correct result. To achieve this, I plan to use a feature of the PostgreSQL database. Currently, I am testing this on PostgreSQL.
Additionally, I have noticed that sorting is currently broken, and I might need some help resolving this issue. The database structure will follow in a future update.
I would appreciate any feedback you might have on this approach.