Wallacoloo / Raspberry-Pi-DMA-Example

Simplest example of copying memory from one region to another using DMA in userland
The Unlicense
93 stars 26 forks source link

Null ouput? #1

Closed Kroltez closed 8 years ago

Kroltez commented 8 years ago

Program without editing is outputting Destination was initially ' ' Destination reads ' '

srcArray is not used after it is initialised ? I can't see src Array referenced anywhere else in the code. Is it supposed to output destination reads 'hello world'?

Kroltez commented 8 years ago

Is it possible that the addresses have changed for raspberrry pi 3. wusing the bcm2837 rather ran the bcm2835?

Kroltez commented 8 years ago

UPDATE: works if all adresses are set to start with 3F000000 rather than 20000000 for pi versions 2 and 3

Wallacoloo commented 8 years ago

Thanks so much for providing a solution @Kroltez! I've had a number of people report this issue in a sister project, and have long-since suspected the issue was an address change, but I couldn't find definitive info.

I went ahead and made new Make targets for rpi2 and rpi3 based on this info. I wouldn't be surprised if more changes need to be made for the Pi v3 since its CPU is 64-bit but the DMA structs are using 32-bit address fields, so look out for some latent bugs on that front unless you can find something conclusive in the documentation.

If you get a chance, confirmation that the new code works would be much appreciated, but I understand if you're unable to test it for any reason.

Kroltez commented 8 years ago

You couldn't give me a hand could you? I'm trying to do a DMA transfer to and from SPI say every 5ms. But at the moment I am having trouble even doing 2 different DMA messages by running your program twice in the same code. Do you know how to do multiple DMA transmissions?, If there is a way of doing a "wait for" or "check done" rather than the sleep command ?

On Mon, Sep 5, 2016 at 10:44 PM, Colin Wallace notifications@github.com wrote:

Thanks so much for providing a solution @Kroltez https://github.com/Kroltez! I've had a number of people report this issue in a sister project, and have long-since suspected the issue was an address change, but I couldn't find definitive info.

I went ahead https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example/commit/3f076a01b61ece9eb71fec8a347f12d402287a11 and made new Make targets for rpi2 and rpi3 based on this info. I wouldn't be surprised if more changes need to be made for the Pi v3 since its CPU is 64-bit but the DMA structs are using 32-bit address fields, so look out for some latent bugs on that front unless you can find something conclusive in the documentation.

If you get a chance, confirmation that the new code works would be much appreciated, but I understand if you're unable to test it for any reason.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example/issues/1#issuecomment-244814357, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-0sE-DmafJY-HEOnOXsEb_XZYqkprGks5qnI0ggaJpZM4J02W8 .

Kroltez commented 8 years ago

2016-09-06-092041_1920x1080_scrot Ideally for now I would like it to count up in a loop.

Any help would be appreciated.

regards Kieron

On Tue, Sep 6, 2016 at 9:06 AM, Kieron Holt holtkieron@gmail.com wrote:

You couldn't give me a hand could you? I'm trying to do a DMA transfer to and from SPI say every 5ms. But at the moment I am having trouble even doing 2 different DMA messages by running your program twice in the same code. Do you know how to do multiple DMA transmissions?, If there is a way of doing a "wait for" or "check done" rather than the sleep command ?

On Mon, Sep 5, 2016 at 10:44 PM, Colin Wallace notifications@github.com wrote:

Thanks so much for providing a solution @Kroltez https://github.com/Kroltez! I've had a number of people report this issue in a sister project, and have long-since suspected the issue was an address change, but I couldn't find definitive info.

I went ahead https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example/commit/3f076a01b61ece9eb71fec8a347f12d402287a11 and made new Make targets for rpi2 and rpi3 based on this info. I wouldn't be surprised if more changes need to be made for the Pi v3 since its CPU is 64-bit but the DMA structs are using 32-bit address fields, so look out for some latent bugs on that front unless you can find something conclusive in the documentation.

If you get a chance, confirmation that the new code works would be much appreciated, but I understand if you're unable to test it for any reason.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example/issues/1#issuecomment-244814357, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-0sE-DmafJY-HEOnOXsEb_XZYqkprGks5qnI0ggaJpZM4J02W8 .

Wallacoloo commented 8 years ago

On the original Pi, there is no elegant way to do what you want without sleeping/spinning, though it is possible to do it in an ugly way. On the Pi 2/3, there may be a better solution which I don't know of. Anyway:

What most people did for the Pi 1 is to use a "paced DMA transfer". Certain peripherals, specifically the PWM and PCM peripherals, consume data at a specific (and usually configurable) rate when active, and maintain their own 4-entry buffers. If you attempt to feed them data while their buffer is full, they will stall the DMA transfer until they consume the next entry.

Furthermore, the Pi's DMA engine allows chaining DMA control blocks (via the NEXTCONBK control block field). i.e. I can create one DMA control block ("block B") that copies data from the SPI and writes it somewhere, and then another one ("block A") which stalls for, say, 5 ms via transferring some amount of data into the PWM/PCM peripherals. Then I can configure block A such that block B is run immediately after completion. The result is that exactly 5 ms after initiating the DMA sequence, the SPI peripheral is read. If I wanted this to be done repeatedly and indefinitely, I would have block B trigger block A upon completion.

The technical detail for creating the block that stalls 5 ms is to configure either PWM or PCM to read at some frequency F, and then creating a control block where DEST_AD=<PWM/PCM FIFO address>, .SRC_AD=<any *valid* physical address>, TXFR_LEN = 5 ms/1000 ms * f and the TI field ("Transfer Info") is such that DEST_INC=0 so that the write address remains at the buffer's head and DEST_DREQ is set to the appropriate value based on whether PWM or PCM or some other timer-based peripheral is pacing the transfer. These values are found on page 61 here.

Checkout dma-gpio.c. It uses PWM to pace the transfers in a very similar way as I just described.

There are some obvious downsides to this method that could warrant more research for the Pi 2/3 though: paced transfers prevent any other application (e.g. userland audio) from using whichever peripheral you choose to do your pacing (PWM/PCM). The DMA channel you use is continually active, potentially slowing down other DMA operations (network, screenbuffer, SD card operations) and wasting electrical energy. The latter is probably not much of an issue for you since you can easily drive the PWM/PCM module at its minimum clock rate.

Kroltez commented 8 years ago

dma-example.txt

This code is essentially your code running in a loop so that it says hi world:1, hi world:2 ... ect however it does hi world 1: and then it doesn't change. Is because the page is cached ? Or am I not resetting something correctly?

Kroltez commented 8 years ago

dma-examplev1.txt Or more simply here I have just copied and pasted the relevant code bellow so now it should say hello world then say hello earth. but the second time it outputs hello world again

Wallacoloo commented 8 years ago

@Kroltez I think you're right about it being a cache problem, as what you have looks fine to me. Perhaps you can try using the makeUncachedMemView function from dma-gpio.c and writing to srcArray / reading from destArray through the uncached views?

Kroltez commented 8 years ago

I'm new to low level programming, what do you mean by writing ang reading though the unchached views ? Copy the function over and call it using destArray = makeUncachedMem(srcArray, PAGE_SIZE, memfed, pagemapfd); ?

On Wed, Sep 7, 2016 at 4:20 PM, Colin Wallace notifications@github.com wrote:

@Kroltez https://github.com/Kroltez I think you're right about it being a cache problem, as what you have looks fine to me. Perhaps you can try using the makeUncachedMemView https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example/blob/master/dma-gpio.c#L526 function from dma-gpio.c and writing to srcArray / reading from destArray through the uncached views?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example/issues/1#issuecomment-245315505, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-0sJuF9xk3dtbatiIUdP4WJ5wlShqeks5qntZQgaJpZM4J02W8 .

Kroltez commented 8 years ago

ok btw just been told by someone that makeVirtPhysPage() doesnt work very well on Rasp V2 and V3 and eh thinks that might be the problem

On Wed, Sep 7, 2016 at 4:48 PM, Kieron Holt holtkieron@gmail.com wrote:

I'm new to low level programming, what do you mean by writing ang reading though the unchached views ? Copy the function over and call it using destArray = makeUncachedMem(srcArray, PAGE_SIZE, memfed, pagemapfd); ?

On Wed, Sep 7, 2016 at 4:20 PM, Colin Wallace notifications@github.com wrote:

@Kroltez https://github.com/Kroltez I think you're right about it being a cache problem, as what you have looks fine to me. Perhaps you can try using the makeUncachedMemView https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example/blob/master/dma-gpio.c#L526 function from dma-gpio.c and writing to srcArray / reading from destArray through the uncached views?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example/issues/1#issuecomment-245315505, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-0sJuF9xk3dtbatiIUdP4WJ5wlShqeks5qntZQgaJpZM4J02W8 .

Wallacoloo commented 8 years ago

@Kroltez Sorry for the delay. That's indeed what I meant though. Any data that both your C code and the DMA engine are going to be reading/writing should be addressed in a way that bypasses the L1 cache.

In the Pi 1, the addressing scheme issuch that any address < 0x20000000 corresponds to a request from the L1 cache, which will query the actual RAM on your behalf . Then the various peripherals are given addresses between 0x20000000-0x40000000. 0x40000000-0x600000 corresponds to the L2 cache, then there are some more memory-mapped devices, and finally the physical RAM is accessed directly by 0xc0000000-0xe00000.

The DMA engine cannot communicate with the L1 cache, but it can use the L2 cache. Therefore, all the addresses that need to be used by your C code and the DMA engine should be modified to be in the range 0x40000000-0x600000. Because the operating system uses virtual addresses, you need to allocate virtual memory for both srcArray and destArray (i.e. malloc()), determine their physical address (virtToPhys()), add 0x40000000 to get a new physical address that bypasses L1, give that physical address to the DMA engine, and then map it to another virtual memory page so that your C code can use it (since all memory accesses in your program have to be to virtual memory), using mmap(). This is essentially what makeUncachedMemView() does.

Given that the bus addresses of all the peripherals changed in the Pi 2/3/zero, the bus address of the L1 cache probably also changed and may be responsible for the failures seen. This is especially likely in the Pi 3, since it comes with double the RAM. If I were in your shoes, I would track down the bus address of the L1 cache and alter the virtToUncachedPhys function so that it adds that value to the physical address (with L1 previously located at 0x40000000, I was able to just bitwise-or the value, but if the new bus address isn't a power of two, you'll have to add that offset).

If you can't find this information in any documentation, you can potentially brute-force solve it by writing a value to some physical ram address, addr, flushing the cache, and then seeing if a read from addr+0x50000000 gives the same value, then test addr+0x60000000, and so on.

Kroltez commented 8 years ago

I got it working here: https://github.com/Kroltez/PI-DMA-Counter Just need to connect it to SPI, I have set up the TI on the control block to do a DREQ on channel 6 but that where I'm at. I have a program that polls the CPU and get a pretty accurate tick so I am using that rather than the PWM whatsit you suggested.

On Fri, Sep 9, 2016 at 4:27 PM, Colin Wallace notifications@github.com wrote:

@Kroltez https://github.com/Kroltez Sorry for the delay. That's indeed what I meant though. Any data that both your C code and the DMA engine are going to be reading/writing should be addressed in a way that bypasses the L1 cache.

In the Pi 1, the addressing scheme issuch that any address < 0x20000000 corresponds to a request from the L1 cache, which will query the actual RAM on your behalf . Then the various peripherals are given addresses between 0x20000000-0x40000000. 0x40000000-0x600000 corresponds to the L2 cache, then there are some more memory-mapped devices, and finally the physical RAM is accessed directly by 0xc0000000-0xe00000.

The DMA engine cannot communicate with the L1 cache, but it can use the L2 cache. Therefore, all the addresses that need to be used by your C code and the DMA engine should be modified to be in the range 0x40000000-0x600000. Because the operating system uses virtual addresses, you need to allocate virtual memory for both srcArray and destArray (i.e. malloc()), determine their physical address (virtToPhys()), add 0x40000000 to get a new physical address that bypasses L1, give that physical address to the DMA engine, and then map it to another virtual memory page so that your C code can use it (since all memory accesses in your program have to be to virtual memory), using mmap(). This is essentially what makeUncachedMemView() does.

Given that the bus addresses of all the peripherals changed in the Pi 2/3/zero, the bus address of the L1 cache probably also changed and may be responsible for the failures seen. This is especially likely in the Pi 3, since it comes with double the RAM. If I were in your shoes, I would track down the bus address of the L1 cache and alter the virtToUncachedPhys function so that it adds that value to the physical address (with L1 previously located at 0x40000000, I was able to just bitwise-or the value, but if the new bus address isn't a power of two, you'll have to add that offset).

If you can't find this information in any documentation, you can potentially brute-force solve it by writing a value to some physical ram address, addr, flushing the cache, and then seeing if a read from addr+0x50000000 gives the same value, then test addr+0x60000000, and so on.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example/issues/1#issuecomment-245946902, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-0sFSyzw_1QFfB95Wc2yypnw6l0-59ks5qoXrTgaJpZM4J02W8 .