CertainLach / jrsonnet

Rust implementation of Jsonnet language
MIT License
304 stars 33 forks source link

Multiple evaluations of same source with different contexts maxes out RAM #171

Open Nerglej opened 3 months ago

Nerglej commented 3 months ago

Hello!

We're trying to manipulate a big amount of data in jsonlines format, which means we're looping through up to 100.000 lines or even more. At the start we had a libsonnet file which would be imported for every line, but we ended up manually including our libsonnet file and merged the library with the code as one big file. To not parse the code for every line I remade the code for the parse_snippet for the State object, and parsed it outside the loop.

The problem I'm having is that my RAM usage maxes out almost immediately. I suppose there's a memory leak somewhere in the library, since I've tried removing all references to jrsonnet, and added std::thread::sleep to simulate the processing time that jrsonnet had, and wrote arbitary data to the output instead. It took the same amount of time, but didn't use anywhere near all of my RAM, so my conclusion is that it's Jrsonnet.

I'm currently on 0.5.0-pre95 from Cargo, and I'm completely aware that it isn't a full release yet, this is merely to help make it even greater!

You'll just get the whole Rust file (it's part of a bigger library, so won't necessarily work on it's own. Let me know if a minimal script really is necessary):

use std::io::{self, BufRead, BufWriter, Write};

use jrsonnet_evaluator::{
    error::ErrorKind::ImportSyntaxError, evaluate, manifest, parser::LocExpr, trace::PathResolver,
    State,
};
use jrsonnet_parser::{IStr, ParserSettings, Source};
use jrsonnet_stdlib::ContextInitializer;
use log::{info, trace};
use serde::Deserialize;

use crate::{InputReader, Writer};

#[derive(Deserialize)]
pub struct JsonnetExporter {
    pub file: JsonnetLibrary,
    pub librarires: Vec<JsonnetLibrary>,
}

impl JsonnetExporter {
    pub fn new(file: JsonnetLibrary, library_contents: Vec<JsonnetLibrary>) -> Self {
        JsonnetExporter {
            file,
            librarires: library_contents,
        }
    }
}

#[derive(Deserialize)]
pub struct JsonnetTemplate {
    pub file: String,
    pub library_paths: Option<Vec<String>>,
}

/// A jsonnet library in code, not a filepath
#[derive(Deserialize)]
pub struct JsonnetLibrary(pub String);

pub fn export(
    input: &mut InputReader,
    output: &mut Writer,
    jsonnet: &JsonnetExporter,
) -> io::Result<()> {
    let mut out = BufWriter::new(output);

    // Merge libraries and file
    let code: Vec<&str> = jsonnet.librarires.iter().map(|v| v.0.as_str()).collect();
    let library_code = code.join("\n");

    let code = &jsonnet.file.0;

    trace!("Merging libraries and files");

    let merge = format!("{}{}", library_code, code);

    trace!("Parsing merged files");

    // Read merged code and parse it, to avoid parsing for every item
    let source = get_source("<jsonnet_exporter>", &merge);
    let parsed = parse_snippet(&source, &merge).unwrap();

    info!("Started exporting via jsonnet");

    let mut buf = String::new();
    let state = State::default();

    while input.read_line(&mut buf).unwrap() > 0 {
        let ctx = ContextInitializer::new(state.clone(), PathResolver::Absolute);
        let _ = ctx.add_ext_code("item", &buf).unwrap();

        state.set_context_initializer(ctx);

        let res = evaluate(state.create_default_context(source.clone()), &parsed).unwrap();
        let output = res.manifest(manifest::StringFormat).unwrap();

        out.write_all(output.as_bytes())?;

        // Cleanup
        out.flush().unwrap();
        buf.clear();
    }

    info!("Finished exporting file");

    Ok(())
}

fn get_source(name: impl Into<IStr>, code: impl Into<IStr>) -> Source {
    let code = code.into();
    let source = Source::new_virtual(name.into(), code.clone());
    source
}

fn parse_snippet(
    source: &Source,
    code: impl Into<IStr>,
) -> jrsonnet_evaluator::error::Result<LocExpr> {
    let code = code.into();
    let parsed: LocExpr = jrsonnet_parser::parse(
        &code,
        &ParserSettings {
            source: source.clone(),
        },
    )
    .map_err(|e| ImportSyntaxError {
        path: source.clone(),
        error: Box::new(e),
    })?;

    Ok(parsed)
}
CertainLach commented 3 months ago

For long-running applications you need to collect garbage sometimes, it is not fully automatic (There was a branch with automated GC based on linux's memory pressure information, but it is not stable, and only works on linux) jrsonnet_gcmodule::collect_thread_cycles()

Note that collection happens per-thread, collecting garbage in one thread will not affect other threads. You can check how many objects are allocated with jrsonnet_gcmodule::count_thread_tracked()

Nerglej commented 3 months ago

Okay thank you. I ended up collecting the garbage for x amount of loops, and it helped a lot on memory usage. Thanks! This should probably be documented somewhere😊