Float16 Lib - Githubissues

johannes-777 commented 2 years ago

Hi Rob, I have Float16 Lib from you but couldn't find it in your repos. I found 2 issues:

f32tof16 should use

f16_bin= (sgn ? 0x8000 : 0x0000) | exp | man;
instead of   f16_bin= sgn ? 0x8000 : 0x0000 | exp | man;

A test over the whole f16 numberspace revealed issues with certain numbers, where the error to f32 is 50%.
Examples:
- F32 is -32764.726562, F16 is -16384.000000 -> error is -50.00%! F16 encoding: 0xF400
- F32 is -31.996500, F16 is -16.000000 -> error is -49.99%! F16 encoding: 0xCC00
- F32 is -7.999200, F16 is -4.000000 -> error is -49.99%! F16 encoding: 0xC400
- F32 is -1.999800, F16 is -1.000000 -> error is -49.99%! F16 encoding: 0xBC00
- F32 is -0.499900, F16 is -0.250000 -> error is -49.99%! F16 encoding: 0xB400
- F32 is -0.499900, F16 is -0.250000 -> error is -49.99%! F16 encoding: 0xB400
- F32 is -0.499900, F16 is -0.250000 -> error is -49.99%! F16 encoding: 0xB400
- F32 is 7.999300, F16 is 4.000000 -> error is 50.00%! F16 encoding: 0x4400
- F32 is 31.992901, F16 is 16.000000 -> error is 49.99%! F16 encoding: 0x4C00
- F32 is 511.911591, F16 is 256.000000 -> error is 49.99%! F16 encoding: 0x5C00
- F32 is 8190.818359, F16 is 4096.000000 -> error is 49.99%! F16 encoding: 0x6C00
- F32 is 32763.277344, F16 is 16384.000000 -> error is 49.99%! F16 encoding: 0x7400

Do you have any ideas? I am using ESP32

johannes-777 commented 2 years ago

Just ported to GCC and found one more:

F32 is 1.999700, F16 is 1.000000 -> error is 49.99%! F16 encoding: 0x3C00 Code is the same as on ESP32, so it seems to be a rounding issue with the float implementation on ESP32 and AMD64

RobTillaart commented 2 years ago

Hi @johannes-777

That float16 library was an experiment and never finished, properly tested or published on GitHub. Think you got it from the Arduino forum.

A quick look in my library folder show that the date of the .h file is march 2015 (6+yrs ago). So too long ago to have an educated clue.

That said I notice that all failing numbers are powers of 2. So expect there is a single underlying bug. Gut feeling says the exponent is set 1 too high/low which might point at a < that should be <= or so.

I can setup a repo for this library - labeled experimental. As I have quite some work already it would be low in priority. I vaguely recall it was not the easiest code I worked with, so it needs focus (== time). Having a repo would at least bring development in some structured process, that would allow others to help to get it working.

Would that be an idea?

johannes-777 commented 2 years ago

Hi Rob, I absolutely agree, this lib required quite some attention. I found the problem here:

exp <<= 10; //man++; -> Leads to rounding issues (e.g. 7.999300,31.992901, 511.911591 ...) : https://github.com/RobTillaart/Arduino/issues/176 man >>= 1; f16_bin= (sgn ? 0x8000 : 0x0000) | exp | man;

Max Error to f32 is 0.096%, which is acceptable for 10 bit mantissa (fraction), even though it may be improved by smarter rounding.

RobTillaart commented 2 years ago

I will create a repro - do you have a version number of the version you use as a start point?

RobTillaart commented 2 years ago

https://github.com/RobTillaart/float16

will fill it with build-CI files so we can have some automated tests and examples

RobTillaart commented 2 years ago

update

created first pull request
based upon a quick reworked last version I had.
two examples
added build-CI to compile those examples

There is still a lot todo but we have a starting point.

I transfer this issue to the float16 repo.

RobTillaart commented 2 years ago

@johannes-777 Merged the first pull request to have a new starting point. (did not make a release 0.1.4 yet)

Can you check if this version still has the bug?
Can you write a minimal sketch that shows the problem? (can be used in the unit test)

RobTillaart commented 2 years ago

started with some unit tests ...
and a lot of other testing ...
fixed negative numbers

merged into master

RobTillaart commented 2 years ago

@johannes-777

A test over the whole f16 numberspace revealed issues with certain numbers, where the error to f32 is 50%. Examples: F32 is -32764.726562, F16 is -16384.000000 -> error is -50.00%! F16 encoding: 0xF400 F32 is -31.996500, F16 is -16.000000 -> error is -49.99%! F16 encoding: 0xCC00 ...

Can you please check if the last master branch still shows this issue? (or share the sketch so I can verify)

RobTillaart commented 2 years ago

0.1.4 will be released asap, so it can be published and tested in a broader audience.

done added to the Arduino library manager. published on https://platformio.org/lib/show/13206/float16

RobTillaart commented 2 years ago

@johannes-777 0.1.5 in PR

added basic math + some helper functions.
optimized the compare operators
more examples

RobTillaart commented 4 months ago

@johannes-777

FYI, today I reworked the float16 library to fix a bug (issue #10) and correct the implementation of subnormal numbers (< 0.00006...) Just to let you know. The new version 0.2.0 will be released later this week.

RobTillaart / float16

Float16 Lib #2