SAITPublic / PIMSimulator

Processing-In-Memory (PIM) Simulator
Other
112 stars 42 forks source link

Refresh Logic #17

Open Arash1381-y opened 3 months ago

Arash1381-y commented 3 months ago

Based on the output traces, the refresh cycles are not changing the bank states or relies on them. as an example the refresh commands were done for a rank that does not exist in MUL kernel and the rank number works like a counter for the refresh unit.

I want to know if this is a bug or the number of cycles for refreshing is somehow pre calculated based on the trace len and other parameters.

sample of grep results on output trace of MUL kernel with single channel config:

601:REF ch0 ra1 @61
976:REF ch0 ra2 @91
1249:REF ch0 ra3 @121
1505:REF ch0 ra4 @151
1761:REF ch0 ra5 @181
2017:REF ch0 ra6 @211
2273:REF ch0 ra7 @241
2529:REF ch0 ra8 @271
2785:REF ch0 ra9 @301
3041:REF ch0 ra10 @331
3297:REF ch0 ra11 @361
3553:REF ch0 ra12 @391
3809:REF ch0 ra13 @421
4065:REF ch0 ra14 @451
4321:REF ch0 ra15 @481
4577:REF ch0 ra16 @511
4833:REF ch0 ra17 @541
5089:REF ch0 ra18 @571
5345:REF ch0 ra19 @601
5601:REF ch0 ra20 @631
5857:REF ch0 ra21 @661
6113:REF ch0 ra22 @691
6369:REF ch0 ra23 @721
6625:REF ch0 ra24 @751
6881:REF ch0 ra25 @781
7137:REF ch0 ra26 @811
7393:REF ch0 ra27 @841
7649:REF ch0 ra28 @871
7905:REF ch0 ra29 @901
8161:REF ch0 ra30 @931
iamshcha commented 3 months ago

A configuration with more than 30 ranks in a single channel is very strange.

If you have difficulty configuring, use the configuration already provided in test cases.

Arash1381-y commented 3 months ago

I use a predefined single-channel configuration. The problem is that the number of ranks per channel isn't defined by the configuration; instead, it's calculated based on the data size and the sizes of different levels. Consequently, when running a large chunk of data on a small memory, the memory creates virtual ranks, which are only used for refreshing. This issue persists even when I use the default configuration with the default multiplication kernel. As a result, there are refresh requests for ranks that don't exist (in this case, 1 channel per rank, which results in 64 virtual ranks that are only used for refreshing). Below is the grep result using the 64-channel configuration with the default multiplication kernel.

image

Is this behavior logical? or is it logical to fix rank to 1?

also here is the code that specifies the number of the ranks in MemorySystem.cpp

    uint64_t bytePerRank = static_cast<uint64_t>(
        (getConfigParam(UINT64, "NUM_ROWS") *
         (getConfigParam(UINT, "NUM_COLS") * getConfigParam(UINT, "DEVICE_WIDTH")) *
         getConfigParam(UINT, "NUM_BANKS") *
         (getConfigParam(UINT64, "JEDEC_DATA_BUS_BITS") / getConfigParam(UINT, "DEVICE_WIDTH"))) /
        8);
    uint64_t megsOfStoragePerRank = bytePerRank >> 20;

    num_ranks_ = (megsOfMemory / Byte2MB(bytePerRank));

    // If this is set, effectively override the number of ranks
    if (megsOfMemory != 0)
    {
        num_ranks_ = (num_ranks_ > 0) ? num_ranks_ : 1;
        if (num_ranks_ == 0)
        {
            PRINT("WARNING: Cannot create memory system with "
                  << megsOfMemory << "MB, defaulting to minimum size of " << megsOfStoragePerRank
                  << "MB");
        }
        setDevConfigParam(UINT, "NUM_RANKS", num_ranks_);
        // FIXME, need to synchronize between ConfigurationDB and Configuration
        config.NUM_RANKS = num_ranks_;
    }
iamshcha commented 3 months ago

The simulator calculates the memory size per rank based on the parameters defined in the configuration file and dynamically determines the number of ranks based on the memory size as input. As mentioned above, we recommend using the configuration provided by the test cases.