devkitPro / nds-examples

Examples for Nintendo DS using devkitARM, calico, libnds, libdvm, maxmod, dswifi
108 stars 18 forks source link

Unsupported instructions #14

Closed Kuratius closed 1 week ago

Kuratius commented 11 months ago

Bug Report

What's the issue you encountered?

I tried to use an arm assembly function written using custom instructions only avaible on the arm9 core of the dsi in the hello world program in this project. http://www.coranac.com/2009/07/sines/

sin.s
/home/dk/nds-examples/hello_world/source/sin.s: Assembler messages:
/home/dk/nds-examples/hello_world/source/sin.s:7: Error: selected processor does not support `smulwt r1,r0,r0' in ARM mode
/home/dk/nds-examples/hello_world/source/sin.s:10: Error: selected processor does not support `smulwt r0,r2,r1' in ARM mode
/home/dk/nds-examples/hello_world/source/sin.s:13: Error: selected processor does not support `smulwb r0,r1,r0' in ARM mode
make[1]: *** [/opt/devkitpro/devkitARM/base_rules:52: sin.o] Error 1
make: *** [Makefile:95: build] Error 2

NDS special

The assembly version given above uses standard ARM instructions, but one of the interesting things is that the NDS' ARM9 core has special multiplication instructions. In particular, there is the SMULWx instruction, which does a wordhalfword multiplication, where the halfword can be either the top or bottom halfword of operand 2.The main result is 32×16→48 bits long, of which only the top 32 bits are put in the destination register. Effectively it's like ab>>16 without overflow problems. As a bonus, it's also slightly faster than the standard MUL. By slightly changing the parameters, the down-shift factors r and s can be made 16, fitting perfectly with this instruction, although the internal accuracy is made slightly worse. Additionally, careful placement of each instruction can avoid the interlock cycle that happens for multiplications.

How can the issue be reproduced?

put this into a file called sin.s

    .align
    .global isin_S4a9
isin_S4a9:
    movs    r0, r0, lsl #(31-13)    @ r0=x%2 <<31       ; carry=x/2
    sub     r0, r0, #1<<31          @ r0 -= 1.0         ; sin <-> cos
    smulwt  r1, r0, r0              @ r1 = x*x          ; Q31*Q15/Q16=Q30

    ldr     r2,=14016               @ C = (1-pi/4)<<16
    smulwt  r0, r2, r1              @ C*x^2>>16         ; Q16*Q14/Q16 = Q14
    add     r2, r2, #1<<16          @ B = C+1
    rsb     r0, r0, r2, asr #2      @ B - C*x^2         ; Q14
    smulwb  r0, r1, r0              @ x^2 * (B-C*x^2)   ; Q30*Q14/Q16 = Q28
    mov     r1, #1<<12
    sub     r0, r1, r0, asr #16     @ 1 - x^2 * (B-C*x^2)
    rsbcs   r0, r0, #0              @ Flip sign for odd semi-circles.

    bx      lr

and try to compile the following using the make command:

    $Id: main.cpp,v 1.13 2008-12-02 20:21:20 dovoto Exp $

    Simple console print demo
    -- dovoto

---------------------------------------------------------------------------------*/
#include <nds.h>

#include <stdio.h>

volatile int frame = 0;

//---------------------------------------------------------------------------------
void Vblank() {
//---------------------------------------------------------------------------------
    frame++;
}

extern signed long isin_S4a9(signed long);

//---------------------------------------------------------------------------------
int main(void) {
//---------------------------------------------------------------------------------
    touchPosition touchXY;

    irqSet(IRQ_VBLANK, Vblank);

    consoleDemoInit();

    iprintf("      Hello DS dev'rs\n");
    iprintf("     \x1b[32mwww.devkitpro.org\n");
    iprintf("   \x1b[32;1mwww.drunkencoders.com\x1b[39m");

    while(1) {

        swiWaitForVBlank();
        scanKeys();
        int keys = keysDown();
        if (keys & KEY_START) break;

        touchRead(&touchXY);

        // print at using ansi escape sequence \x1b[line;columnH 
        iprintf("\x1b[10;0HFrame = %d",frame);
        iprintf("\x1b[16;0HTouch x = %04X, %04X\n", touchXY.rawx, touchXY.px);
        iprintf("Touch y = %04X, %04X\n", touchXY.rawy, touchXY.py);
        iprintf("Sin y= %ld\n", isin_S4a9(55));

    }

    return 0;
}

Environment?

Kuratius commented 11 months ago

This may be an issue with these instructions not being supported in devKitPro, but could also be an issue with the makefile. I reused the helloworld makefile from this project, which seems to target the arm9 cpu.

Kuratius commented 11 months ago

Talked to someone else, apparently the issue is that the Makefile for this example is missing -march=armv5te in the ARCH section. In addition, the assembly needs the information that the label is a function label via .type LABELNAME, %function.

WinterMute commented 1 week ago

Sorry to take a while getting back round to this. We've had a big refactor in the works for a while that we had hoped to make public much earlier. Amongst the many fixes and refactors we did was moving the architecture flags to the ARCH variable in the Makefiles where it probably should have been in the first place. See https://github.com/devkitPro/nds-examples/commit/bd322e3a5eabcfb52384f897e34684f83b7b00ce

We also provide a BEGIN_ASM_FUNC macro in libnds which provides the necessary type as well as putting the function in it's own section to support the linker in stripping unused functions from the final binary. https://github.com/devkitPro/libnds/blob/master/include/nds/asminc.h