Add support for reading / writing compressed files

mithro commented 5 years ago

See https://github.com/mithro/duck2-gsoc/issues/16

duck2 commented 5 years ago

I did an experiment with zstr streams:

void get_root_elements(const char *filename){
    pugi::xml_document doc;
    pugi::xml_parse_result result;

    std::string x(filename);
    if(x.rfind(".") != std::string::npos && x.substr(x.rfind(".")+1) == "gz"){
        std::ifstream F;
        F.open(x);
        zstr::istream Z(F);
        result = doc.load(Z);
    } else {
        result = doc.load_file(filename);
    }

    if(!result)
        throw std::runtime_error("Could not load XML file " + std::string(filename) + ".");
    for(pugi::xml_node node= doc.first_child(); node; node = node.next_sibling()){
        if(std::strcmp(node.name(), "rr_graph") == 0){
            count_rr_graph(node);
            alloc_arenas();
            load_rr_graph(node, &rr_graph);
        }
        else throw std::runtime_error("Invalid root-level element " + std::string(node.name()));
    }
}

Artix 7 rr_graph run with uncompressed file(922 MB)(without errno checking after strtol calls): 7.645 8.097 7.600 7.636 7.677

With gzip-compressed file: 11.34 11.15 11.10 11.29 11.13

mithro commented 5 years ago

Is that with or without a hot disk cache? Can you try flushing that?

duck2 commented 5 years ago

It's with a hot disk cache. Without the file in the cache, the reading time can jump to 11 seconds or so.

mithro commented 5 years ago

@duck2 - How does the time between with gzip and without gzip compare without file in the disk cache?

litghost commented 4 years ago

Once SAX parsing support is complete (#3), a compressed one pass SAX parser may be a good compromise between CPU/disk/memory usage. Unclear if a two pass SAX + compression would have good numbers.

SymbiFlow / uxsdcxx

Add support for reading / writing compressed files #4