Open mithro opened 5 years ago
I did an experiment with zstr streams:
void get_root_elements(const char *filename){
pugi::xml_document doc;
pugi::xml_parse_result result;
std::string x(filename);
if(x.rfind(".") != std::string::npos && x.substr(x.rfind(".")+1) == "gz"){
std::ifstream F;
F.open(x);
zstr::istream Z(F);
result = doc.load(Z);
} else {
result = doc.load_file(filename);
}
if(!result)
throw std::runtime_error("Could not load XML file " + std::string(filename) + ".");
for(pugi::xml_node node= doc.first_child(); node; node = node.next_sibling()){
if(std::strcmp(node.name(), "rr_graph") == 0){
count_rr_graph(node);
alloc_arenas();
load_rr_graph(node, &rr_graph);
}
else throw std::runtime_error("Invalid root-level element " + std::string(node.name()));
}
}
Artix 7 rr_graph run with uncompressed file(922 MB)(without errno checking after strtol calls):
7.645 8.097 7.600 7.636 7.677
With gzip-compressed file:
11.34 11.15 11.10 11.29 11.13
Is that with or without a hot disk cache? Can you try flushing that?
It's with a hot disk cache. Without the file in the cache, the reading time can jump to 11 seconds or so.
@duck2 - How does the time between with gzip and without gzip compare without file in the disk cache?
Once SAX parsing support is complete (#3), a compressed one pass SAX parser may be a good compromise between CPU/disk/memory usage. Unclear if a two pass SAX + compression would have good numbers.
See https://github.com/mithro/duck2-gsoc/issues/16