davea42 / libdwarf-code

Contains source for libdwarf, a library for reading DWARF2 and later DWARF. Contains source to create dwarfdump, a program which prints DWARF2 and later DWARF in readable format. Has a very limited DWARF writer set of functions in libdwarfp (producer library). Builds using GNU configure, meson, or cmake.
Other
175 stars 69 forks source link

strdup is not checked for memory allocation failure #134

Closed hnarkaytis closed 2 years ago

hnarkaytis commented 2 years ago

I reviewed a function mentioned in Valgrind discussion and would like to provide some feedback.

  1. strdup is not checked for a failure. I would expect that strcmp with NULL will core dump everything. I.e. tests with custom memory allocator should produce a core dump in that case. Please double check why it doesn't happened.
  2. It doesn't make sense to make a copy of a string for tfind. You should check if string is in the tree already and make a copy only in case if it was not found. That will save unnecessary allocation. I would expect that it is a historical artifact and originally there were no tfind check and tsearch was used. It is also a good way to save CPU cycles. In this case code should look like (NB! not even checked with compiler):

    retval = dwarf_tsearch(str, &makename_data, value_compare_func);
    if (retval) {
        re = *(VALTYPE *)retval;
        if (re == str)
          {
          /* Our string was just added and we need to replace it on a copy. */
             re = strdup (str);
             if (NULL == re)
               dwarf_tdelete (str, &makename_data, value_compare_func);
            else
               *(VALTYPE *)retval = re;
          }
    
        return re;
    }
  3. makename_data is not implicitly initialized. It will be NULL as a static variable, but I'm not sure that standard says anything about that.
  4. makename_destructor could be marked as __attribut__ ((destructor)), so you don't need to call it explicitly from library teardown function. Will work only with gcc and clang. VC users will need to upgrade Windows to latest Ubuntu.
  5. TRUE/FALSE macros are never used.
davea42 commented 2 years ago

No, I am not going to use gcc/clang attribut ((destructor)). Destructors will be explicit. C90!

hnarkaytis commented 2 years ago

Fair enough. Only hardcore!

davea42 commented 2 years ago

The lack of a check of strdup for failure was a bug. Fixed.

The strdup in dd_makename.c for tfind and tdelete was unnecessary, Fixing that now. Along with print_macro.c missing strdup check. Should be the last changes for 0.4.2. (releasing Sep 15)

makename_data is guaranteed to be initialized to zeros as it is file-global, not function-local. by ISO C 1989/1990 section 6.5.7 (the wording says pointers set to null pointer constant, not zero, but for all machines in actual use today that means all-zero bits).

hnarkaytis commented 2 years ago

One more minor comment on that. Description of the function reads as: "This function can return NULL if memory is exhausted.", but in the first couple "unsuccessful" branches static empty string is returned.

Also I don't like error reporting strait to console. I've seen that you changed stderr, to stdout. As an external consumer of the library I would like to control everything that goes to stdout. As a true unix way engineer I'd like to support pipelining through stdout to some external tool. External tool might be surprised with fancy verbose error from libdwarf. Probably this is a subject for another RFE. I have pretty opinionated view on how to design error reporting subsystem, but I would expect David will come up with his ideas first.

hnarkaytis commented 2 years ago

Another topic is an implementation of tsearch. I made a comment here.

Balanced trees will work good enough for this use case, but will contribute to memory fragmentation. Open hashing will be more performant and less impactful for memory fragmentation.

No action items on this. Just a general consideration.

davea42 commented 2 years ago

The functions that did use stderr and now use stdout are in dwarfdump.

Not in libdwarf. libdwarf never prints an error and never prints anything. The one case of printf from libdwarf is where the caller asks for a printout of the inner details of line data. And the data comes back as strings via a callback from libdwarf. I imagine only dwarfdump would ever request and use this callback.

dwarfdump prints a lot and the last line is a count of errors. I liked the idea of all errors printing (as far as possible) in context, rather than separated in a separate category (stderr).

I used hash tree in libdwarf because it performs better. For dwarfdump I used balanced/avl tree as for small trees it takes less space than the hash tree (as each is implemented here, I mean). And I kind of liked the idea of using a different tree algorithm in dwarfdump.

There is always room for improvement.

davea42 commented 2 years ago

A head record in a balanced tree (distinct from the tree itself, different from the current struct, means shortening the struct used all over, and at that point using low bits of llink (for example) as the balance flag, would reduce the major structure (now) from 48 to 24 bytes. Not trying that now. Would have to wait till after 0.4.2 releases (this week). Now in a quiet period running every test of any possible value.

davea42 commented 2 years ago

I meant 40 to 24.

hnarkaytis commented 2 years ago

I would not use user data pointer for storage of balance bits. This should be treated as an opaque 64/32 bit value (depending on architecture). I.e. there is no guaranty that user will have properly aligned pointer here. For example user could store integer values and tree should work as a set of unique values.

davea42 commented 2 years ago

I did not mean use the user data pointer for the bit. Sorry I was unclear.

Some of my tests indeed store integer values instead of pointers to structures.

Because all strdup are now checked for failure this can be closed. Note that in some cases we just let the NULL from a failure be recorded in a structure. Later code is not surprised by a NULL.