deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
152 stars 122 forks source link

Memory record in LCAO code is not correct #3831

Open dyzheng opened 3 months ago

dyzheng commented 3 months ago

Describe the Code Quality Issue

There are memory related issues now:

3789 there are some errors in memory loggings.

3675 Memory cost is too large in LCAO code, and the loggings is not enough.

3710 Memory cost in PW code is not enough too.

3657 Memory cost is too large in NSPIN=4 calculation in LCAO code.

Additional Context

There are two methods to record the used memory in ABACUS:

  1. cmake -B build -DDEBUG_INFO=1 and then check the remaining memory in every ModuleBase::TITLE() function.
  2. check the MEMORY(MB) block in OUT.{suffix}/running_{calculation}.log .

Ideally, these two method can get same peak memory cost in one calculation, but now they are not.

We should try to follow all memory records in TITLE to fix the omissive ModuleBase::Memory::record() .

Task list for Issue attackers (only for developers)

jieli-matrix commented 3 months ago

As we know, memory usage involves several cases:

  1. Stack Memory
    • Used to store local variables, function parameters, and return addresses within functions.
    • Stack memory is automatically allocated and freed by the compiler.
  2. Heap Memory
  3. Global/Static Storage, Constant Storage Area and Code Segment/Text Segment

So usually we should focus on the 1st and 2nd point, especially the 2nd one. Furthermore, I think it is better to analyze them independently(stack-memory v.s. time, heap-memory v.s. time) rather than analyze the whole memory usage(memory v.s. time).
I suggest some tools for profiling of stack memory and heap memory as follows:

  1. Callgrind Do we really care about the exact usage of stack memory? It simply reflects the overly deep recursive calls, so we can use callgrind and related graph tools to show the function call graphs.
  2. Massif Massif will show the which part of the code allocates the most memory and the report will display a timeline of memory allocations.

At last, we should define the problem:

jieli-matrix commented 3 months ago

I suggest to overload new and delete operators to record memory usage. Reference: https://www.cprogramming.com/tutorial/operator_new.html cc: @dyzheng @Religious-J

dyzheng commented 3 months ago

I suggest to overload new and delete operators to record memory usage. Reference: https://www.cprogramming.com/tutorial/operator_new.html cc: @dyzheng

I don't think so, the first reason is the new and delete would be called many times, the record memory operators would lead to poor performance; the second reason is there are some containers don't use the new or delete, such as containers in STL, maybe they use malloc or others; the third reason is the purpose of memory records in ABACUS is not for record exact memory cost, but for developers to analyze the memory cost of algorithm, only few classes or functions are the hotspots, we only need to find and record them.

jieli-matrix commented 3 months ago

I don't think so, the first reason is the new and delete would be called many times, the record memory operators would lead to poor performance; the second reason is there are some containers don't use the new or delete, such as containers in STL, maybe they use malloc or others; the third reason is the purpose of memory records in ABACUS is not for record exact memory cost, but for developers to analyze the memory cost of algorithm, only few classes or functions are the hotspots, we only need to find and record them.

To the first reason, the overloaded operators can only used in memory debugging mode but not global scope. To the second reason, that's right and maybe we should find some solutions. To the third reason, we need to discuss how we invent tools for developers to analyze the memory cost of algorithm and in which cases.

Solving the current memory record error is important, however we may not depend on developers to find and record memory every time in the future.

WHUweiqingzhou commented 3 months ago

Please also pay attention to #3852