This goes slightly against what I said in #37, but something that would be quite useful
would be a bulk zero-initialisation. There are lots of zero-holes in memory maps,
whether that is because it is zero-initialised data-section, or because there is a long
run of zeros within a sparsely initialised data-section.
An extra boot command that can zero initialise arbitrary segments of RAM using
a single packet would reduce the amount of message traffic needed at start-up,
particularly when we have MBs of data that is mostly zeros.
e.g. something like this:
else if (cmd == StoreZeroCmd) {
// Store zeros to data memory
int n = msgIn->args[0]; // Size ***in bytes*** to transfer (saves an instruction)
uint32_t addrEnd=addrReg + n;
while( addrReg < addrEnd ){
* (uint32_t*) addrReg = 0;
addrReg += 4;
}
}
I estimate that a total burden of 10-ish instructions added to the bootloader,
and it should be able to fill at about 1 word per 5-ish instructions - presumably
it would end up being DRAM bandwidth limited.
This is assuming that:
DRAM is not already zero-initialised: I assume it isn't?
Bandwidth from host to boards is much less than total bandwidth to DRAMs; We've
got one PCI Expression link at ~1GB/sec, but even with Aesop we have 6 DRAMs
which offer 12GB/s * 6 = 72 GB/s.
So for a system which is loading multi-GB sections on to DRAM this could
reduce the serial cost quite a bit.
Note that I'm aware that a lot can already be done to support faster loading,
e.g. using multiple threads per DRAM to load, and packing multiple words
into each packet. However a memset instruction would be easy to integrate
into the existing hostlink loaders without adding much complexity, and also
make more sophisticated loaders faster.
-
_Flagrantly not using the PEP system I literally only just proposed because I don't
have time right now - this is more a reminder to turn this into one if it makes sense._
This goes slightly against what I said in #37, but something that would be quite useful would be a bulk zero-initialisation. There are lots of zero-holes in memory maps, whether that is because it is zero-initialised data-section, or because there is a long run of zeros within a sparsely initialised data-section.
An extra boot command that can zero initialise arbitrary segments of RAM using a single packet would reduce the amount of message traffic needed at start-up, particularly when we have MBs of data that is mostly zeros.
e.g. something like this:
I estimate that a total burden of 10-ish instructions added to the bootloader, and it should be able to fill at about 1 word per 5-ish instructions - presumably it would end up being DRAM bandwidth limited.
This is assuming that:
DRAM is not already zero-initialised: I assume it isn't?
Bandwidth from host to boards is much less than total bandwidth to DRAMs; We've got one PCI Expression link at ~1GB/sec, but even with Aesop we have 6 DRAMs which offer 12GB/s * 6 = 72 GB/s.
So for a system which is loading multi-GB sections on to DRAM this could reduce the serial cost quite a bit.
Note that I'm aware that a lot can already be done to support faster loading, e.g. using multiple threads per DRAM to load, and packing multiple words into each packet. However a memset instruction would be easy to integrate into the existing hostlink loaders without adding much complexity, and also make more sophisticated loaders faster.
-
_Flagrantly not using the PEP system I literally only just proposed because I don't have time right now - this is more a reminder to turn this into one if it makes sense._