Size of the RethinkDB database is expanding at a rapid rate

The size of the RethinkDB data dump is quite large (>200 MB). It is problematic because it lengthens the duration of a data dump or restore (order of minutes now). This is all despite the fact that there are less than 400 public Documents.

I did a bit of experimenting with a recent dump by removing various items and looking at the dump size (.tar.gz) summarized in Table 1. It appears that pruning the _ops field (which stashes every action performed) and the relatedPapers can drop the size almost 88%. Given this, some possible solutions to keep the size reasonable:

_ops: Periodically prune on a manual basis on some local dump then restore
relatedPapers: Store PMIDs (rather than full paper details) and let the browser retrieve these on-demand
- We do this with app-ui and it is pretty reasonable

Table 1. Biofactoid RethinkDB dump file sizes	Description	Size (MB)	% Change
*Full DB (june 19, 2024)	202	0.0%	Counts: 4649 Documents and 6813 Elements
**Remove _ops	96	-52.5%
**Remove relatedPapers	131	-35.1%
Remove trashed and initiated Documents	173	-14.4%	Removed 4226 Documents and 2192 Elements

*Dump archive: factoid_dump_2024-06-19_14-28-33-767.tar.gz **Actions applied to Document and Element table

PathwayCommons / factoid

Size of the RethinkDB database is expanding at a rapid rate #1276