RobTillaart / float16

Arduino library to implement float16 data type
MIT License
16 stars 2 forks source link

Error in Float16 #10

Closed alecelular closed 4 months ago

alecelular commented 4 months ago

I found an error when converting integers to float16 when I use the range of numbers between 32760 and 32767 (also for their negatives) that erroneously gives 16384. Can this problem be solved? I used an ESP8266. I attached an example of how I got that error. Thank you.

En castellano: Buen día. Encontré un error al convertir enteros a float16 cuando utilizo el rango de números entre 32760 y 32767 (también para sus negativos) que da erróneamente 16384. ¿Se podrá solucionar ese problema? Usé un ESP8266. Adjunto ejemplo de como obtuve ese error. Gracias. Alejandro F. Fernández Buenos Aires, Argentina

#include "float16.h"

// Error -> 32760 / 32767

float16 f16;

void setup()
{
  delay(500);
  Serial.begin(115200);
  while (!Serial) delay(1);
  Serial.println();
  Serial.println(__FILE__);
  Serial.print("FLOAT16_LIB_VERSION: ");
  Serial.println(FLOAT16_LIB_VERSION);

  f16.setDecimals(6);

  for (uint32_t x = 32740; x < 32790; x++)
  {
    f16 = x;
    Serial.print(x);
    Serial.print("\t");
    Serial.print(f16);
    Serial.print("\t");
    Serial.println();
    yield();
  }
  Serial.println("\ndone");
}

void loop()
{
}
// -- END OF FILE --

OUTPUT

float16_error.ino
FLOAT16_LIB_VERSION: 0.1.8
32740    32736.0000
32741    32736.0000
32742    32736.0000
32743    32736.0000
32744    32752.0000
32745    32752.0000
32746    32752.0000
32747    32752.0000
32748    32752.0000
32749    32752.0000
32750    32752.0000
32751    32752.0000
32752    32752.0000
32753    32752.0000
32754    32752.0000
32755    32752.0000
32756    32752.0000
32757    32752.0000
32758    32752.0000
32759    32752.0000

32760    16384.0000  *
32761    16384.0000  *
32762    16384.0000  *
32763    16384.0000  *
32764    16384.0000  *
32765    16384.0000  *
32766    16384.0000  *
32767    16384.0000  *

32768    32768.0000
32769    32768.0000
32770    32768.0000
32771    32768.0000
32772    32768.0000
32773    32768.0000
32774    32768.0000
32775    32768.0000
32776    32768.0000
32777    32768.0000
32778    32768.0000
32779    32768.0000
32780    32768.0000
32781    32768.0000
32782    32768.0000
32783    32768.0000
32784    32800.0000
32785    32800.0000
32786    32800.0000
32787    32800.0000
32788    32800.0000
32789    32800.0000

(updated post for syntax highlighting)

alecelular commented 4 months ago

También con: (also with:) 8190 6C00 4096.0000 8191 6C00 4096.0000

RobTillaart commented 4 months ago

Thanks for reporting this issue. Will investigate later, might take a few days.

alecelular commented 4 months ago

Muchas gracias.

El mar, 5 mar. 2024 10:16, Rob Tillaart @.***> escribió:

Thanks for reporting this issue. Will investigate later, might take a few days.

— Reply to this email directly, view it on GitHub https://github.com/RobTillaart/float16/issues/10#issuecomment-1978757682, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGWOIFPHEO5I6OCHPPY5W7LYWXAUPAVCNFSM6AAAAABEHDGSSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYG42TONRYGI . You are receiving this because you authored the thread.Message ID: @.***>

RobTillaart commented 4 months ago

@alecelular

One of the things one must understand float16 trades range for accuracy. A float16 uses 2 bytes and from the 16 bits only 11 are used for the mantissa. With 11 bits one can represent only about 3 decimal digits. That is why the numbers jump.

(from readme.md)

Specifications

attribute value notes
size 2 bytes layout s eeeee mmmmmmmmmm (1,5,10)
sign 1 bit
exponent 5 bit
mantissa 11 bit ~ 3 digits <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
minimum 5.96046 E−8 smallest positive number.
1.0009765625 1 + 2^−10 = smallest nr larger than 1.
maximum 65504

OK, that said, that does not explain why it jumps to 16384 as in your post.

I will try to see if I can recreate your problem so I can investigate. (using an UNO)

RobTillaart commented 4 months ago

@alecelular

Recreated the problem here, so I can start investigate

Test 1

First step is to see where it happens. As all numbers are in increasing order (internal representation too) every next number should be larger in internal format.

Found only the two places you already found.

DEC internal internal previous
8190 6C00 6FFF
32760 7400 77FF

Test 2

The example float16_test_all.ino sets all internal representations and converts them correctly to increasing floats.

First conclusion

Given the two tests above the problem happens in the conversion to internal format.

Next Actions

  1. Investigate if there is a failure in the negative range too.
  2. Investigate the function f32tof16() as that is the workhorse for converting to internal format.
RobTillaart commented 4 months ago

Test negative numbers

DEC internal internal previous
-8190 EC00 EFFF
-32760 F400 F7FF

So they "fail" at the same spot.

alecelular commented 4 months ago

That is, the problem is not the representation of the number in two bytes, since if I put the correct values, they will be represented well. Are you saying that the problem is the f32tof16() function that converts them wrong? Castellano: O sea, el problema no es la representación del número en dos bytes, ya que si pongo los valores correctos, se representarán bien. ¿Decís que el problema es la función f32tof16() que los convierte mal?

alecelular commented 4 months ago

It seems that it is the adjacencies to the powers of two that need to be reviewed Castellano: Da la impresión que son las adyacencias a las potencias de dos lo que hay que revisar

RobTillaart commented 4 months ago

Are you saying that the problem is the f32tof16() function that converts them wrong?

exactly that

It seems that it is the adjacencies to the powers of two that need to be reviewed

I have isolated the problem, the mantissa overflows. I have made a patch and will release it in a develop branch a.s.a.p. as I am testing it right now.

alecelular commented 4 months ago

Muchas gracias.

RobTillaart commented 4 months ago

Note: In theory the mantissa overflow can happen with non-integers too. Action: device a test to check all numbers...

RobTillaart commented 4 months ago

I have pushed a preliminary 0.2.0 version to the develop branch so you can check.

I still have to do

alecelular commented 4 months ago

Ahora lo probaré. Gracias. / Now I'll try it. Thank you.

alecelular commented 4 months ago

I did some simple tests and it worked. Thank you so much!

Hice unas pruebas sencillas y ha funcionado. ¡Muchas gracias!

  for (uint32_t x = 32752; x < 32770; x++)
  {
    f16 = x;
    Serial.print(x);
    Serial.print("\t");
    Serial.print(f16.getBinary(),HEX);
    Serial.print("\t");
    Serial.print(f16);
    Serial.print("\t");
    Serial.println();
    yield();
  }

OUTPUT

32752   77FF    32752.0000  
32753   77FF    32752.0000  
32754   77FF    32752.0000  
32755   77FF    32752.0000  
32756   77FF    32752.0000  
32757   77FF    32752.0000  
32758   77FF    32752.0000  
32759   77FF    32752.0000  
32760   7800    32768.0000  
32761   7800    32768.0000  
32762   7800    32768.0000  
32763   7800    32768.0000  
32764   7800    32768.0000  
32765   7800    32768.0000  
32766   7800    32768.0000  
32767   7800    32768.0000  
32768   7800    32768.0000  
32769   7800    32768.0000  
RobTillaart commented 4 months ago

Good to hear that part works,

In my search for non integer bugs I found serious problems with the subnormal numbers == < 0.000061035.... So I need to investigate and reimplement that part.

RobTillaart commented 4 months ago

@alecelular Think I fixed the subnormal numbers too now,

I added an example float16_issue_10.ino that tests all possible numbers, From internal to float to internal, all values are the same e Two issues are remaining:

Enough progress for today, ;)


PS found this on wikipedia

ARM processors support (via a floating point control register bit) an "alternative half-precision" format, which does away with the special case for an exponent value of 31 (111112).[10] It is almost identical to the IEEE format, but there is no encoding for infinity or NaNs; instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008.

Might be interesting to implement to as it has even a "larger" range.

alecelular commented 4 months ago

I would agree to expand the range. Maybe define positive infinity as the largest positive number. And the same for negative. Furthermore, it seems incorrect to me that there is a number that is not a number.

Estaría de acuerdo en ampliar el rango.Tal vez definir el infinito positivo como el número más grande positivo.Y lo mismo para el negativo.Además me parece incorrecto que exista un número que no sea un número.

El mar, 5 mar. 2024 17:13, Rob Tillaart @.***> escribió:

@alecelular https://github.com/alecelular Think I fixed the subnormal numbers too now,

  • in conversion from internal to float
  • in conversion from float to internal

I added an example float16_issue_10.ino that tests all possible numbers, From internal to float to internal, all values are the same e Two issues are remaining:

  • zero versus minus zero ( 0 != -0) if a float16 x = -0; should it keep the sign?
  • extend unit test with what is learned today

Enough progress for today, ;)

PS found this on wikipedia

ARM processors support (via a floating point control register bit) an "alternative half-precision" format, which does away with the special case for an exponent value of 31 (111112).[10] It is almost identical to the IEEE format, but there is no encoding for infinity or NaNs; instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008.

Might be interesting to implement to as it has even a "larger" range.

— Reply to this email directly, view it on GitHub https://github.com/RobTillaart/float16/issues/10#issuecomment-1979553232, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGWOIFOSQBZYABHHTYP4W5DYWYRPNAVCNFSM6AAAAABEHDGSSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGU2TGMRTGI . You are receiving this because you were mentioned.Message ID: @.***>

RobTillaart commented 4 months ago

Furthermore, it seems incorrect to me that there is a number that is not a number.

some examples of NAN (Not A Number, sometimes called singularities)

float f = 1/ 0.0;
float g = tan(math.PI / 2);
float h = log(-1);

I would agree to expand the range. Maybe define positive infinity as the largest positive number.

If I would implement it, it would be according to spec, although I like your idea to use the largest for infinity.

alecelular commented 4 months ago

Then you could implement negative 0 as it is not a number. If you want to do things according to standards, that's fine. But you can with an optional configuration, change that and expand the numerical range, which I think is going to be much more useful than allocating so many lost combinations for that.

Entonces se podría implementar el 0 negativo como no es un número. Si querés hacer las cosas según estándares esta bien. Pero podes con una configuración opcional, cambiar eso y ampliar el rango numérico, que pienso que va a ser mucho más útil que destinar tantas combinaciones perdidas por eso.

El mar, 5 mar. 2024 17:34, Rob Tillaart @.***> escribió:

Furthermore, it seems incorrect to me that there is a number that is not a number.

some examples of NAN (Not A Number, sometimes called singularities)

float f = 1/ 0.0;float g = tan(math.PI / 2);float h = log(-1);

I would agree to expand the range. Maybe define positive infinity as the largest positive number.

If I would implement it, it would be according to spec, although I like your idea to use the largest for infinity.

— Reply to this email directly, view it on GitHub https://github.com/RobTillaart/float16/issues/10#issuecomment-1979590805, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGWOIFLBGSABFJQG3PNS2DLYWYT3XAVCNFSM6AAAAABEHDGSSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGU4TAOBQGU . You are receiving this because you were mentioned.Message ID: @.***>

RobTillaart commented 4 months ago

The ARM "half float" format uses all possible values and thus has a maximum range and has no NAN or INF in its standard. So mapping INF to the largest negative or positive value is a logical option. Mapping NAN to -0 gives me second thoughts as although -0 can still be seen as a valid value by many users. So rather map NAN also to the largest positive value.

Note: for the range 65536-131071 there are only 1024 actual values so they are on average 64 apart which is quite a bit in absolute sense. IN relative sense these numbers still have 3+ significant digits.

Need to find a good name for this ARM variant, something like float16ARM or float16ext

alecelular commented 4 months ago

Prefiero Float16ext. ¿Ya lo hiciste o es una versión nueva?

El mié, 6 mar. 2024 06:48, Rob Tillaart @.***> escribió:

The ARM "half float" format uses all possible values and thus has a maximum range and has no NAN or INF in its standard. So mapping INF to the largest negative or positive value is a logical option. Mapping NAN to -0 gives me second thoughts as although -0 can still be seen as a valid value by many users. So rather map NAN also to the largest positive value.

Note: for the range 65536-131071 there are only 1024 actual values so they are on average 64 apart which is quite a bit in absolute sense. IN relative sense these numbers still have 3+ significant digits.

Need to find a good name for this ARM variant, something like float16ARM or float16ext

— Reply to this email directly, view it on GitHub https://github.com/RobTillaart/float16/issues/10#issuecomment-1980476201, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGWOIFM3PIKRUNZR2D2KAADYW3RAPAVCNFSM6AAAAABEHDGSSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQGQ3TMMRQGE . You are receiving this because you were mentioned.Message ID: @.***>

RobTillaart commented 4 months ago

Float16ext it will be. No I haven't written it, I am not that fast. Just added it with 50+ other ideas for new libraries. Although I can reuse a lot, I have to go through 100% of the code, documentation, examples etc. Given that this fix took until now ~6 hours, writing float16ext would take maybe 10-15 hours or so. First finalize this one

alecelular commented 4 months ago

Thank you very much for the effort! What you have done is a great contribution. ​ ¡Muchas Gracias por el esfuerzo! Es un gran aporte lo que haz hecho.

El mié, 6 mar. 2024 07:24, Rob Tillaart @.***> escribió:

Float16ext it will be. No I haven't written it, I am not that fast. Just added it with 50+ other ideas for new libraries. Although I can reuse a lot, I have to go through 100% of the code, documentation, examples etc. Given that this fix took until now ~6 hours, writing float16ext would take maybe 10-15 hours or so. First finalize this one

— Reply to this email directly, view it on GitHub https://github.com/RobTillaart/float16/issues/10#issuecomment-1980544928, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGWOIFPUOJAT5TSTLRLQRL3YW3VFDAVCNFSM6AAAAABEHDGSSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQGU2DIOJSHA . You are receiving this because you were mentioned.Message ID: @.***>

RobTillaart commented 4 months ago

Merged develop branch, Released 0.2.0 Build is running.

RobTillaart commented 4 months ago

Had an insight about -0 and 0.

If a float is positive and too small to represent as float 16, it will be 0 (sort of +0) If a float is negative and too small to represent as float16 it makes some sense to name it -0.

just a thought.

alecelular commented 4 months ago

I understand what you are saying, it is like the limits in mathematics when a value tends to zero, but depending on which side of zero it goes. But the utility for calculations on a processor, I imagine that zero would not change anything if it is positive or negative. If, on the other hand, I consider positive zero important, because it is the only way that can be easily detected, with all its bits set to zero. I insist that the negative zero can be used to indicate that it is not a number, and leave the positive maximum as infinity to take advantage of the combinations that exceed 65536 and the same with the negative maximum (or will it be the negative minimum) for infinity. In those cases all the bits of the mantissa would be 1 and easy to detect.

Entiendo lo que decís, es como los límites en matemáticas cuando un valor tiende a cero, pero dependiendo de que lado del cero va. Pero la utilidad para calculos en un procesador, imagino que el cero no cambiaría nada si es positivo o negativo. Sí, en cambio el cero posito vo lo considero importante, porque es la única manera que se puede detectar fácilmente, con todos sus bits en cero. Yo insisto que el cero negativo puede aprovecharse para indicar que no es un número, y dejar el máximo positivo como infinito para aprovechar las combinaciones que supran a 65536 y lo mísmo con el máximo negativo (o será el mínimo negativo) para infinito. En esos casos todos los bits de la mantisa serían 1 y fácil de detectar.

RobTillaart commented 4 months ago

Made a initial version of float16ext (strip version). Seems to work pretty well. Will make a repo asap

intern  f.toDouble()    print(f)
0   0.0000000000    0.0000000000
1   0.0000000596    0.0000000596
2   0.0000001192    0.0000001192
3   0.0000001788    0.0000001788
....
65528   -130559.9062500000  -130559.9062500000
65529   -130623.9062500000  -130623.9062500000
65530   -130687.9062500000  -130687.9062500000
65531   -130751.9062500000  -130751.9062500000
65532   -130815.9062500000  -130815.9062500000
65533   -130879.9062500000  -130879.9062500000
65534   -130943.9062500000  -130943.9062500000
65535   -131007.9062500000  -131007.9062500000
RobTillaart commented 4 months ago

@alecelular develop branch - https://github.com/RobTillaart/float16ext

RobTillaart commented 4 months ago

Discussion continues - https://github.com/RobTillaart/float16ext/issues/2

alecelular commented 4 months ago

Fantastic. When you finish it, I will use it. I think it's very good that you agreed to do that.

Fantástico.Cuando lo termines, lo usaré.Me parece muy bueno que hayas aceptado hacer eso.

El mié, 6 mar. 2024 15:54, Rob Tillaart @.***> escribió:

Discussion continues - RobTillaart/float16ext#2 https://github.com/RobTillaart/float16ext/issues/2

— Reply to this email directly, view it on GitHub https://github.com/RobTillaart/float16/issues/10#issuecomment-1981569434, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGWOIFMUOCM4O5KBSPN44O3YW5Q6FAVCNFSM6AAAAABEHDGSSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRGU3DSNBTGQ . You are receiving this because you were mentioned.Message ID: @.***>

RobTillaart commented 4 months ago

just released 0.1.0 version

RobTillaart commented 4 months ago

very experimental 😎

alecelular commented 4 months ago

I'll try that code instead of float16 in a few hours. Then I'll tell you how it worked..

El mié, 6 mar. 2024 16:02, Rob Tillaart @.***> escribió:

very experimental 😎

— Reply to this email directly, view it on GitHub https://github.com/RobTillaart/float16/issues/10#issuecomment-1981582566, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGWOIFKJAZZUSYWIAJSPOVDYW5R3XAVCNFSM6AAAAABEHDGSSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRGU4DENJWGY . You are receiving this because you were mentioned.Message ID: @.***>