EightAndAHalfTails / eee3-imgtec-fpu

0 stars 0 forks source link

No worse than chaining definition? #150

Open reginalio opened 10 years ago

reginalio commented 10 years ago

In some cases of 2D and 3D dot product, the definition is very vague, resulting in cases where the acceptable answer are all +inf, -inf and NaN.

One case of 2D dot product is illustrated below. We have (+inf)(+inf) + (+large norm)(-norm), if we obtain the two products separately, we have (+inf) + (-inf), which would result in a NaN. However, since the second product is finite in infinite precision, our test entity gives +inf. In my opinion both result should be passed in this cases, but there are other cases esp. in 3D dot product where the situation is a lot more awkward as shown below.

Warning: 2D dot product of 0:11111111:00000000000000000000000, 0:11111111:00000000000000000000000, 0:11100001:10010010101111000100001 and 1:10011110:00110010000011101111001gives 0:11111111:00000000000000000000000 which is incorrect. Correct answer is NAN

Our 3D dot product at the moment is constructed from a 2D dot product sub-entity and a multacc sub-entity. i.e. with input a,b,c,d,e,f, we perform (ef + (ab+cd)).

In a case where ab,cd,ef = ( -1e50, 1, 1e100) = (-inf from overflow, norm, inf from overflow)

At infinite precision, ab + cd + ef gives +inf

With our chained operation using dot2 and multacc, ef + (ab+cd) gives -inf since (ab+cd) is rounded to (-inf) and e*f is finitely large.

If we perform the operation as three mult and two adds, (ab)+(cd)+(ef) = NaN since we are adding (-inf) to (+inf)

Therefore, which result should the testbench pass? all three? or just the top two?

EightAndAHalfTails commented 10 years ago

I think the "correct answer" should be the one obtained by doing the computations at infinite precision, only rounding at the end of the operation (that is, +inf for the first and +inf for the second). Whether it's feasible to have our entities return these values every time is another question, however. :P

reginalio commented 10 years ago

but the specification states "no worse than chaining", which means the testbench should pass the less "correct" results as well...? and if reducing precision gives lower latency I think we should go for the reduced precision method