PavlidisLab / Gemma

Genomics data re-analysis
Apache License 2.0
23 stars 6 forks source link

Lazy-loading for CLOBs (large text blobs) #671

Open arteymix opened 1 year ago

arteymix commented 1 year ago

Little background: BLOBs and CLOBs are stored separately in a database engine and thus require extra work to retrieve by the database engine.

We mainly use Hibernate's materialized CLOB/BLOB for ensuring that those are loaded into our entities. This is unnecessary for most cases and detrimental to query performance if we don't make use of them.

It penalizes search performance and for terms like "brain" where pretty much all datasets come up, we could avoid a lot of work.

The solution is to make use of the built-in java.sql.Clob and java.sql.Blob types for mapping those. It can and should be done on a case-by-case basis where performance is necessary.

arteymix commented 3 weeks ago

This would be critical for vectors, especially single cell ones which tend to be quite voluminous.