[Improvement request] Large gamelist.xml causes slow in-system navigation

Fordi commented 9 years ago

When gamelist.xml is reasonably large (representing, say, 500 files), navigating the (I think) DetailedGameListView becomes very much bogged down.

I suspect that what's going on is that the code may be parsing gamelist.xml on each movement - which isn't really ideal.

I'm no C++ programmer, but I know I've had a similar problem messing with very large XML files in Java. To get around it, I used a parser that gives back character positions (vtd-xml). After the initial parse, I keep an index of elements I care about - character offset and length of element derived from the offset of the closing tag and/or next Node - and can easily seek/copy/parse just the elements I care about. If, when querying data, the file is updated, I drop and re-parse the index.

The index can have a very low memory footprint for this kind of application, too. For something like a gamelist.xml, it'd just be a pair of longs for each game, and a metadata object for the system (as well as whatever namespace information is relevant). No need to keep the game data in memory; just parse and swap as-needed.

Anyway, if it's available in C++, it would be a big speed gain for low-powered systems.

Fordi commented 9 years ago

Hey, neat. It looks like pugi::xml_node has a debug_offset method that will get you what you'd need. Using a debugging method feels icky and all, but hey, hacks for speed, yeah?

robertybob commented 9 years ago

@Fordi ELI5 for a novice please? :+1: This speeds up the exiting of Emulationstation and is safe to use if your gamelists are final and you're happy there's no errors in them?

Fordi commented 9 years ago

It would speed up navigation by reducing the parser overhead to the relevant chunk of xylophone for the entity you'really after. It may not speed up startup or exit (you'really still parsing the gamelists, after all).

It should be safe to do this in all cases, so long as you're throwing out / rewriting the index for an xml file whenever the relevant filesize modification time changes (including when you modify the file - though writing an update would be more like (split the file to into temp "before chunk" and "after chunk", write it back with the new chunk in place, reindex). You might be able to get away with updating the indices mathematically post-write, rather than going full reparse, but it depends on how complex your index is.

Fordi commented 9 years ago

Essentially: after you parse the dom, you know you care about certain elements. For a gamelist, the primary thing you care about is the game tag, so instead of dealing with them right there, you store some kind of index to them: something like anew array of (pointer to the filename, a start offset, and an end offsetobtained by node.nextSibling.offset - 1 (or a length obtained similarly)).

Then, when you need the model data, you can get it by grabbing just the part of the file you want, parsing that, and populating the model from it.

What this gets you is not having to either parse the whole file each time you want it, or keeping the whole DOM in memory. If you're really clever, you can work out a dead simple preparser - one that doesn't create a dom, but knows how to find the tags you care about.

As for errors: it's xml. If it's not well-formed and to the schema it's meant to be, it should fail in some graceful fashion. In our case, a minimal pre-parse check would be that the substring starts and ends with a valid tag with the tag name we expect.

Aloshi / EmulationStation

[Improvement request] Large gamelist.xml causes slow in-system navigation #480