BjarneStroustrup / profiles

site for discussing profiles design
Creative Commons Attribution 4.0 International
349 stars 13 forks source link

Simple idea (and question) #10

Open Shadoware opened 1 month ago

Shadoware commented 1 month ago

I was thinking about memory safety recently but I'm still learning C++ and I'm not an advanced user but based on all the lectures I've watched I had this simple idea but I'm not sure how useful it could be. The programming languages provide memory safety usually by not allowing memory errors to occur(like GC or Borrow checker) but C++ currently works with a mix of strategies, like smart pointer and static analyzers etc. It seems to me that avoiding memory errors is not really necessary if the language could just spot and notify the user of all memory errors and and related problems. Completely avoiding memory error have the cost either making programming harder or in runtime performance.

One strategy that could help is to have double versions of memory unsafe libraries and features. One version would be just to debug the code, where more verifications related to memory safety could be performed and other version would be the production one, to be used after all related memory issues were fixed. This way we could put much more code to check for memory safety inside librarys like bounds checking in all index operations, std::shared_pointer could perform more checks like cyclic reference, etc. In debug mode even the c++ runtime could perform some checks. That would impact the program performance but just in the debug mode. After all issues were fix it would be just a matter of use some -DNDEBUG flag and recompile the code and we would have a correct and performant version running. Adding just bounds checking to everything would not be a complicated process because the duplicated library would not be so much different from the current ones in the standard and thus not hard to maintain.

If the code could detect all memory errors it could be just a matter to use some fuzzing to the program inputs and all errors could be found. I wrote this simple Array example in two versions just to illustrate the idea. Do you think this approach could be useful?

#include <iostream>
#include <source_location> // From C++23 - to capture the file line number where function is called.

#define DEBUG

#ifdef DEBUG
    // **Debug version - bounds checking**
    template<typename T, size_t size = 0>
    class Array 
    {
    public:
        constexpr auto operator[] (int index, std::source_location const& location = std::source_location::current()) -> T&
        {
            if(0 <= index && index < static_cast<int>(m_size)) {
                return arr[index];
            }
            else {
                std::cout << "Error: Array overflow at line " << location.line() << '.' << std::endl; 
                exit(1);
            }
        }
    private:
        T arr[size];
        size_t m_size = size;
    };

#else 
    // **production version - no bounds checking for performance.**
    template<typename T, size_t size = 0>
    class Array 
    {
    public:
        constexpr auto operator[] (size_t index) -> T&
        {
            return arr[index];
        }
    private:
        T arr[size];
        size_t m_size = size;
    };

#endif

auto main() -> int
{
   // There would be no change in syntax
    Array<int, 3> myArray;
    myArray[0] = 11;
    myArray[1] = 12;
    myArray[2] = 13;
    myArray[3] = 14; // error: overflow

    std::cout << "myArray[0]: " << myArray[0] << std::endl;
    std::cout << "myArray[1]: " << myArray[1] << std::endl;
    std::cout << "myArray[2]: " << myArray[2] << std::endl;

    return 0;
}
mikucionisaau commented 1 month ago

C assert macro does what you have in your code:

Note that STL containers provide two ways for subscripting (independent from C assertions):

Also C++ provides compile time assertions via static_assert.

So we really have a lot of options, we just need to stick to good ones.

Shadoware commented 1 month ago

Ok. I actually know that, mikucionisaau. The main point is to have two versions of every library that can cause memory issues. Also, the problem is not only bounds checking, but to have a lot more memory checks(ideally all of them) and it wouldn't change the syntax for the end user neither slow the production code performance. Also the checks would be more automatically performed because only in the production version the programmer would compile it with -DNDEBUG flag. So, lot of memory checking by default.

I think what the community needs to decide is:

Thanks for the attention.

mikucionisaau commented 1 month ago

Yes, the answer is yes :-) but what specifically would you like to see?

There are many options for sanitizers, see Undefined Behavior Sanitizer, Address Sanitizer, Leak Sanitizer, Thread Sanitizer: https://github.com/google/sanitizers/wiki

Also Stack Smashing Protector: https://wiki.osdev.org/Stack_Smashing_Protector

Very useful. Most are available as built-in options in GCC, Clang and some are in MSVC.

I just do not like the idea of duplication: I follow the DRY principle, otherwise maintenance is a hell.

Shadoware commented 1 month ago

Recently I've watch a CppNow video about Attachable Leak Sanitizer from Bojun Seo that seems cool too because it doesn't make the code slow like Valgrind does. I'm just suggesting that checks in the language are easier to use than external tools, but everything that helps is welcome. Anyhow, any thing new will have to be maintained anyway, but in practical terms, it would not really be necessary to duplicate all the libraries, just remove some lines when the code is compiled in "production mode". Today there is pressure for memory safety on the market and I don't know if C++ can perfectly provide that yet, so anything that helps is good to take in consideration. Stroustrup said that everybody should be more concerned with memory safety and do what they can to improve it. That's why I'm making these suggestions.