commontoolsinc / synopsys

datastore service for datums
2 stars 1 forks source link

MDB_MAP_FULL after a moderate number of assertions #27

Closed anotherjesse closed 4 weeks ago

anotherjesse commented 1 month ago

from a clean database, I assert simple facts that look like this:

[ { "/": cid }, "count", count } ]

Depending on how many I assert in a batch (and if I add other asserts of large content) I quickly (a few hundred to thousand) get the error:

{"message":"Task was aborted\nError: MDB_MAP_FULL"}

You can use this script to reproduce:

import { CID } from "npm:multiformats@13.3.0/cid";
import * as json from 'npm:multiformats@13.3.0/codecs/json'
import { sha256 } from 'npm:multiformats@13.3.0/hashes/sha2'

const SYNOPSYS_URL = Deno.env.get("SYNOPSYS_URL") || "http://localhost:8080";

export async function cid(data: any) {
    const bytes = json.encode(data);
    const hash = await sha256.digest(bytes);
    const cid = CID.create(1, json.code, hash);
    return cid.toString();
}

export async function import_fake(start: number, batch_size: number) {
    let facts: Fact[] = [];
    for (let i = start; i < start + batch_size; i++) {
        facts.push([{ "/": await cid({ count: i }) }, "count", i]);
    }
    // console.log(JSON.stringify(facts));

    return await fetch(`${SYNOPSYS_URL}/assert`, {
        method: 'PATCH', body: JSON.stringify(facts),
    }).then(r => r.json())
}

if (import.meta.main) {
    const batch_size = 1;
    let count = 1;
    while (true) {
        console.log(`Importing ${count}...`);
        let result = await import_fake(count, batch_size);
        if (!result.ok) {
            console.log(`Error importing ${count}: ${JSON.stringify(result.error)}`);
            break;
        }
        count += batch_size;
    }
}
anotherjesse commented 1 month ago

this version shoves addition key/values for each cid - causing the crash earlier

import { CID } from "npm:multiformats@13.3.0/cid";
import * as json from 'npm:multiformats@13.3.0/codecs/json'
import { sha256 } from 'npm:multiformats@13.3.0/hashes/sha2'

const SYNOPSYS_URL = Deno.env.get("SYNOPSYS_URL") || "http://localhost:8080";

export async function cid(data: any) {
    const bytes = json.encode(data);
    const hash = await sha256.digest(bytes);
    const cid = CID.create(1, json.code, hash);
    return cid.toString();
}

export async function import_fake(start: number, batch_size: number, base: any) {
    let facts: Fact[] = [];
    for (let i = start; i < start + batch_size; i++) {
        let id = { "/": await cid({ count: i }) };
        facts.push([id, "count", i]);
        Object.keys(base).forEach(k => {
            facts.push([id, k, base[k]]);
        });
    }
    // console.log(JSON.stringify(facts));

    return await fetch(`${SYNOPSYS_URL}/assert`, {
        method: 'PATCH', body: JSON.stringify(facts),
    }).then(r => r.json())
}

function generateRandomString(length: number): string {
    const characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
    let result = '';
    for (let i = 0; i < length; i++) {
        result += characters.charAt(Math.floor(Math.random() * characters.length));
    }
    return result;
}

if (import.meta.main) {
    const batch_size = 100;
    let count = 1;
    while (true) {
        console.log(`Importing ${count}...`);
        let base = {
            "extra": generateRandomString(1000),
            "extra2": generateRandomString(1000),
            "extra3": generateRandomString(1000),
            "extra4": generateRandomString(1000),
            "extra5": generateRandomString(1000),
            "extra6": generateRandomString(1000),
            "extra7": generateRandomString(1000),
            "extra8": generateRandomString(1000),
            "extra9": generateRandomString(1000),
            "extra10": generateRandomString(1000),
        };
        let result = await import_fake(count, batch_size, base);
        if (!result.ok) {
            console.log(`Error importing ${count}: ${JSON.stringify(result.error)}`);
            break;
        }
        count += batch_size;
    }
}
Gozala commented 1 month ago

Thanks @anotherjesse for the script illustrating the issue, I have turned it into a test case in the PR. Running it locally I get 4131 import rounds and then a crash. Looking at the data.mdb it appears slightly larger than 10MB. Looks like 2.53kb per import which is not great , but we also did not attempt to optimize any of this so perhaps that is not too surprising.

Gozala commented 1 month ago

Did a little test where I wrote json file with all the asserts that went into DB and it came out around 16x times smaller. Some napkin math to asses if current overhead is within the reason (without any optimizations)

Okra tree representation has overhead of it's own and probably so does LMDB. In this scenario we generate completely unique data so it's pretty pathological case. All in all 16x overhead is perhaps within the expectations until we take time to optimize things.

anotherjesse commented 1 month ago

Am I holding this wrong?

I have a checkout of main on the latest, with no local changes:

jesse@fourteen synopsys % git pull
Already up to date.
jesse@fourteen synopsys % git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

I deleted all my node_modules and re-installed (I would have used npm ci if a lock file was commited)

jesse@fourteen synopsys % rm -rf node_modules           
jesse@fourteen synopsys % npm i              
npm WARN deprecated inflight@1.0.6: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
npm WARN deprecated rimraf@3.0.2: Rimraf versions prior to v4 are no longer supported
npm WARN deprecated glob@7.2.3: Glob versions prior to v9 are no longer supported

added 257 packages, and audited 258 packages in 2s

89 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

I started up synopsys with an empty store:

jesse@fourteen synopsys % rm -rf service-store          
jesse@fourteen synopsys % npm run start      

> synopsys@1.4.1 start
> node src/main.js

And then I ran count script above and it fails ~500 items:

Error importing 493: {"message":"Task was aborted\nError: MDB_MAP_FULL"}

(I see a pnpm lock file - I could try using it? although at this point we have npm, deno and now pnpm? do we need all three?)

anotherjesse commented 1 month ago

Trying with pnpm

jesse@fourteen synopsys % rm -rf node_modules
jesse@fourteen synopsys % pnpm ci
ERR_PNPM_CI_NOT_IMPLEMENTED  The ci command is not implemented yet
jesse@fourteen synopsys % pnpm i 
Lockfile is up to date, resolution step is skipped
Packages: +246

Progress: resolved 246, reused 246, downloaded 0, added 246, done

dependencies:
+ @canvas-js/okra 0.4.5
+ @canvas-js/okra-lmdb 0.2.0
+ @canvas-js/okra-memory 0.4.5
+ @ipld/dag-cbor 9.2.1
+ @ipld/dag-json 10.2.2
+ @noble/hashes 1.3.3
+ @types/node 22.5.5
+ datalogia 0.8.0
+ multiformats 13.3.0

devDependencies:
+ @web-std/fetch 4.2.1
+ @web-std/stream 1.0.3
+ c8 8.0.1
+ entail 2.1.2
+ playwright-test 14.0.0
+ prettier 3.1.0
+ typescript 5.3.3

Done in 2.1s
jesse@fourteen synopsys % pnpm run start     

> synopsys@1.4.1 start /Users/jesse/ct/synopsys
> node src/main.js

unfortunately it still errors out after ~500 with the MDB issue

Importing 557...
Importing 558...
Error importing 558: {"message":"Task was aborted\nError: MDB_MAP_FULL"}
Gozala commented 1 month ago

@anotherjesse one thing that occured to me, it may be that size is fixed at DB creation time. So if you had a small db at /Users/jesse/ct/synopsys restarting synopsys with new size may not have an effect.

anotherjesse commented 1 month ago

see #31

Gozala commented 4 weeks ago

This was fixed