PostgreSQL tsvector support? Generic column support?

stanislavkozlovski commented 2 months ago

I have the folowing:

@Entity
@Table(name = "lcps")
class DBLocalContentPiece(title: String, content: String) : DBBase() {
  @Lob
  var title: String = title

  @Lob
  var content: String = content

  @Column(columnDefinition = "tsvector")
  var contentVector: String? = null

  @Column(columnDefinition = "tsvector")
  var titleVector: String? = null
}

It blows up with this:

Error: ERROR: column "content_vector" is of type tsvector but expression is of type character varying   Hint: You will need to rewrite or cast the expression.   Position: 111
jakarta.persistence.PersistenceException: Error: ERROR: column "content_vector" is of type tsvector but expression is of type character varying   Hint: You will need to rewrite or cast the expression.   Position: 111

Logging the SQL, I notice this is the SQL that runs:

txn[] insert into lcps (title, content, content_vector, title_vector, version, last_updated, created) values (?,?,?,?,?,?,?); -- bind([LOB],[LOB],null,null,1,2024-08-25T12:27:01.461Z,2024-08-25T12:27:01.461Z)

So I'm unclear what expression is made of type varchar.

I understand EBean does not support tsvector.

What can I do about this?

I have added a trigger to update the vector automatically, so for my needs EBean needs to do some very minimal stuff. I don't even need the vector populated in code. But I can't add the @Transient column since then it wouldn't get created at all.

I am ready to try and contribute support for this if need be.

@rbygrave any thoughts?

rbygrave commented 2 months ago

So I'm unclear what expression is made of type varchar

With JDBC PreparedStatement binding, when we bind NULL we bind it with JDBC java.sql.Types, so in this case those nulls are being bind using java.sql.Types.VARCHAR.

What can I do about this?

I think we need an outline of what we want to do in terms of table columns and sql. My take is that I suspect we want 2 columns with 1 storing the original text values and the other as a derived column storing the tsvector of the text. The tsvector column is insertable=false, updatable=false ... and instead a trigger is used to maintain it? like:

@Column
String content;

@Column(insertable=false, updatable=false, columnDefinition="tsvector")
String contentVector;

What are we actually aiming for in terms of table columns? Is it like the above or something else?

stanislavkozlovski commented 2 months ago

Thanks for the quick answer @rbygrave!

Yes, that's precisely the case. The table is:

create table lcps (
  id                            bigint generated by default as identity not null,
  content_vector                tsvector not null,
  content                       text not null,
  constraint pk_lcps primary key (id)
);

and the triggers are:

        CREATE TRIGGER lcp_content_vector_update
        BEFORE UPDATE ON lcps
        FOR EACH ROW
        WHEN (OLD.content IS DISTINCT FROM NEW.content)
        EXECUTE PROCEDURE tsvector_update_trigger(content_vector, 'pg_catalog.english', content);

        CREATE TRIGGER lcp_content_vector_insert
        BEFORE INSERT ON lcps
        FOR EACH ROW
        EXECUTE PROCEDURE tsvector_update_trigger(content_vector, 'pg_catalog.english', content);

Note that I am not set in stone on this approach.

I am trying to find the easiest way to maintain a tsvector field in Postgres and leverage it for full-text search.

I just went with it because I thought it'd be easier than getting EBean to understand TSVector in the code and modifying every save()/update() method call to ensure that content_vector is updated alongside content

There's also another way to add a generated column in postgres, like:

ALTER TABLE lcps
  ADD content_vector tsvector
    GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;

which also helps ensure you don't need to have application logic to ensure the content and content_vector fields get updated hand in hand

My problem here is that I don't know how to define the variable of the ORM Class.

Can I just not define it as a instance variable? Is there a way to just add the column definition without needing to store it as a class variable? I could write my own static/member method that does raw sql to fetch the value
If I can't define the column in the code, would just manually creating a dbmigration sql altering the table and adding the column work, or would the orm hit issues down the line?
If I do have to define it - what type should I choose? Since tsvector is unsupported and all types seem to have mappings (as per documentation and the code I glanced at) - it seems I'm bound to run into issues?

rbygrave commented 2 months ago

how to define the variable of the ORM Class

I don't think I understand that ... it seems to me the missing step is "how to map the SQL Table definition to Entity class with properties.

In this case we have the 2 columns so we naturally look to map those to 2 bean properties/fields.

@Entity
@Table(name="lcps"
class LcpsFoo {

  @Id long id;

  @Column(name="content_vector", insertable=false, updatable=false, columnDefinition="tsvector")
  String contentVector;

  @Column(name="content")
  String content;
  ...
}

With this mapping, the contentVector isn't included in insert or update as we use the trigger instead.

the variable of the ORM Class

Ah yeah I'm thinking "field" or "property" of the ORM class ... I think the use of "variable" here threw me a bit.

way to just add the column definition without needing to store it

I think you are just missing the use of insertable=false, updatable=false ... I think that is the trick to use here.

ebean-orm / ebean

PostgreSQL tsvector support? Generic column support? #3460

What can I do about this?