bespoke-silicon-group / bsg_manycore

Tile based architecture designed for computing efficiency, scalability and generality
Other
223 stars 58 forks source link

Best method for selectively putting Read-only data in scratchpad #590

Open drichmond opened 2 years ago

drichmond commented 2 years ago

Filing for tracking...

As we write more code, we're frequently seeing something that looks like this:

  1. User writes kernel using C std library code
  2. User profiles kernel and finds unexpected DRAM accessess
  3. User determines that unexpected DRAM accesses are for rodata in DRAM to std library data
  4. User cannot move rodata to scratchpad, so they write their own solution

This has happened when using logf/expf, which use _clztab to count leading zeros, and cosf/sinf functions, which use look-up tables.

What's the "right way" to move rodata into scratchpad, selectively?

taylor-bsg commented 2 years ago

I think for small numbers of constants like logf expf , short term, it may be to hardcode the library C file to place it in the dmem. Longer term maybe to have li.f so the I-cache can manage it :-)

On Tue, Oct 5, 2021 at 2:20 PM Dustin Richmond @.***> wrote:

Filing for tracking...

As we write more code, we're frequently seeing something that looks like this:

  1. User writes kernel using C std library code
  2. User profiles kernel and finds unexpected DRAM accessess
  3. User determines that unexpected DRAM accesses are for rodata in DRAM
  4. User cannot move rodata to scratchpad, so they write their own solution

This has happened when using logf/expf, which use _clztab to count leading zeros, and cosf/sinf functions, which use look-up tables.

What's the "right way" to move rodata into scratchpad, selectively?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bespoke-silicon-group/bsg_manycore/issues/590, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFG5ACSAMN6XGT6KWE7FYTUFNT3DANCNFSM5FMVUO3Q .

drichmond commented 2 years ago

Yeah, I agree.

For posterity, the way I did this was to generate a custom bsg_link.ld by editing bsg_manycore_link_gen.

Original:

    section_map = [
      # Format:
      # <output section>: [<input sections>]
      ['.text.dram'        , ['.text.interrupt', '.crtbegin','.text','.text.startup','.text.*']],
      # bsg-tommy: 8 bytes are allocated in.dmem.interrupt for interrupt handler to spill registers.
      ['.dmem'             , ['.dmem.interrupt', '.dmem','.dmem.*']],
      ['.data'             , ['.data','.data*']],
      ['.sdata'            , ['.sdata','.sdata.*','.sdata*','.sdata*.*'
                              '.gnu.linkonce.s.*']],
      ['.sbss'             , ['.sbss','.sbss.*','.gnu.linkonce.sb.*','.scommon']],
      ['.bss'              , ['.bss','.bss*']],
      ['.tdata'            , ['.tdata','.tdata*']],
      ['.tbss'             , ['.tbss','.tbss*']],
      ['.striped.data.dmem', ['.striped.data']],
      ['.eh_frame.dram'    , ['.eh_frame','.eh_frame*']],
      ['.rodata.dram'      , ['.rodata','.rodata*','.srodata.cst16','.srodata.cst8',
                              '.srodata.cst4', '.srodata.cst2','.srodata*']],
      ['.dram'             , ['.dram','.dram.*']],
      ]

After modifications (see how .dmem has been modified, and .rodata.dram has been removed)

    section_map = [
      # Format:
      # <output section>: [<input sections>]
      ['.text.dram'        , ['.text.interrupt', '.crtbegin','.text','.text.startup','.text.*']],
      # bsg-tommy: 8 bytes are allocated in.dmem.interrupt for interrupt handler to spill registers.
      ['.dmem'             , ['.dmem.interrupt', '.dmem','.dmem.*', '.rodata','.rodata*','.srodata.cst16','.srodata.cst8',
                              '.srodata.cst4', '.srodata.cst2','.srodata*']],
      ['.data'             , ['.data','.data*']],
      ['.sdata'            , ['.sdata','.sdata.*','.sdata*','.sdata*.*'
                              '.gnu.linkonce.s.*']],
      ['.sbss'             , ['.sbss','.sbss.*','.gnu.linkonce.sb.*','.scommon']],
      ['.bss'              , ['.bss','.bss*']],
      ['.tdata'            , ['.tdata','.tdata*']],
      ['.tbss'             , ['.tbss','.tbss*']],
      ['.striped.data.dmem', ['.striped.data']],
      ['.eh_frame.dram'    , ['.eh_frame','.eh_frame*']],
      ['.dram'             , ['.dram','.dram.*']],
      ]