Closed tsteven4 closed 11 months ago
This is true and there's always some case where running out of memory is just a thing. I get that GPX files can have mountains ot text in them, but 105,000 waypoints SHOULD BE a walk in the park with a 2GB address space. Struct waypt is 184 in my builds, so I'm sure its dwarfed by the strings it holds. wpt->gcdata is 104 which isn't lightweight, but it's kind of our fundamental structure.
"Big" is only 144MB, right? Why are we running out of address space? Even if we're sloppy and keep 8 versions of it in memory, that's only a gig. The alloc of new waypt comes from the heap in 192 byte pools, so it's easy to spot the "hearbeat" as we loop through the input.
I spent some time poking at this in Instruments. We have some weird things going on. This reader has been with us through expat and a couple of C++ versions, so there are some forced fits at times.
Do you agree that gpx_start isn't doing what the comment says it's doing int he first line? Can't that be cdatastr.clear() and just let that structure grow over time?
The whole "something else" chain on a pocket query just splats on the wall. I tried to instrument what really gets saved by end_something_else and came up with 50,000 unique copies of the string "geocache", for example. We have 27,000 copies of "1.5" which is either a difficulty or a terrain, but we already have this in geocache_data.diff and .terr. Similarly, type and container are there inside a lovely packed bitfield, but we're storing them as unique strings inside this chain_xml nightmare.
Here's the top tokens I could find:
2146 "\n" 2712 "3" 3372 "Write note" 3816 "2.5" 4839 "Regular" 4978 "Small" 6353 "1" 8565 "Didn't find it" 9759 "2" 13534 "Micro" 23587 "Geocache|Traditional Cache" 23587 "Traditional Cache" 26399 "Tennessee" 26399 "United States" 27025 "1.5" 50826 "Geocache" 79203 "\n " 115281 "Found it" 272816 "\n " 412847 "\n " 448783 "\n " 533239 "\n "
Storing unique literal copies of whitespace seems silly.
We should already have the first 70% of groundspeak:cache and groundspeak:logs in struct Waypoint()
gpx_cdata() needed to apped back when expat could chunk things up on us, but doesn't it always deliver a whole 's' for us now?
Most of the tags hitting gpx_cdata() and the resulting sequently scans and allocations/copies should be hitting the passthrough path, caught by the switches for tt_cache_difficulty and friends and kept out of the gpx_cdata/do_something_else paths, right?
So I think we have a few different problems:
1) We like to pretend that memory is unbounded, but that's not reality. So, yes, we should do a high-level catch block on those. But we should also remember that we should normally have a couple GB of VM available and a "big" list 105,000 points isn't really big.
2) The GPX reader is being dumb, making copies of things it shouldn't be.
3) Our GPX writer should be using custom writers to write out cache:diff ... wpt->diff / 10 and so many of those other fields.
4) It would be a good starter project for someone to make gs_mktype/gs_mkcont and their opposites into real C++ type (maybe a QHashMap? I don't recall)
On Thu, Dec 31, 2020 at 9:28 AM tsteven4 notifications@github.com wrote:
As was mentioned in #659 https://github.com/gpsbabel/gpsbabel/issues/659, unexpected exceptions can cause program termination without any error message.
It is possible to fail memory allocation using the 32 bit windows executable with a large number of waypoints. An example is "-i gpx -f big.gpx -o gpx -F xxx.gpx", where big.gpx contains 105,596 waypoints or thereabouts. In this case std::bad_alloc is thrown and uncaught. The program dies without an error message. It would be easy to catch this exception and at least try to print a message.
This issue is mitigated by the 64 bit windows executable as it is more difficult to cause memory allocation to fail.
There are other uncaught exceptions as well, https://en.cppreference.com/w/cpp/error/exception
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gpsbabel/gpsbabel/issues/661, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3VADZM76LWCBN3BIGVHSDSXSKBDANCNFSM4VPQHKDA .
Do you agree that gpx_start isn't doing what the comment says it's doing int he first line? Can't that be cdatastr.clear() and just let that structure grow over time?
Yes, the comment is incorrect. However either using the clear() method or assigning a default constructed QString seem to be equivalent (if the QString isn't null). For example, in Qt 5.12.10 clear is defined as the assignment of a default constructed QString to the object if the object isn't null. https://github.com/qt/qtbase/blob/c919fc5f65c1c509370df7e616263ed007931a7a/src/corelib/tools/qstring.h#L955-L956 This comes up often in deinit functions and we aren't consistent in selecting one way or the other. In the null case I suppose using the clear method is superior as it avoids constructed a new QString, and in the non-null case they both construct a new QString.
I tried heaptrack. It did cause a segmentation fault (free(): invalid pointer Aborted (core dumped)) so I don't know how trustworthy the data is. It looks like the big pig is QXmlStreamReader::readNext, followed by GpxFormat::gpx_cdata, followed by GpxFormat::gpx_start. This was with your original big.gpx which had 52,798 waypoints.
I think we could save most of GpxFormat::start_something_else. There are a finite number of element names, and a very limited number of namespaceUris and prefixes. But even if add complexity to take advantage of that it is only about 1/8 of the memory usage.
I think these gpx files with large amounts of groundspeak data are of limited interest.
The OP was complaining about hitting the 2G memory limit converting QStarz bin files to kml. Unfortunately the OP went quiet and hasn't responded to requests for more information.
As was mentioned in #659, unexpected exceptions can cause program termination without any error message.
This issue is mitigated by the 64 bit windows executable as it is more difficult to cause memory allocation to fail.
There are other uncaught exceptions as well, https://en.cppreference.com/w/cpp/error/exception