cuda-mode / ring-attention

ring-attention experiments
Apache License 2.0
89 stars 10 forks source link

[info] test results for ring-flash-attention #9

Closed Iron-Bound closed 6 months ago

Iron-Bound commented 7 months ago

Results from the dual RTX A5000 box for ring-flash-attention

test_qkvpackaded_func

############################## forward: ############################## out: max 4.3125, mean 0.04052734375 lse: max 8.985279083251953, mean 7.747061729431152 out diff: [0] max 0.0, mean 0.0 [1] max 0.0009765625, mean 4.863739013671875e-05 lse diff: [0] max 0.0, mean 0.0 [1] max 1.9073486328125e-06, mean 2.862522592295136e-07 ############################## backward: ############################## load_dq: [0] max 2.34375, mean 0.0537109375 [1] max 0.32421875, mean 0.0247802734375 dq diff: [0] max 0.0009765625, mean 4.5693013817071915e-09 [1] max 0.001953125, mean 4.363059997558594e-05 load_dk: [0] max 3.328125, mean 0.050537109375 [1] max 0.2294921875, mean 0.011962890625 dk diff: [0] max 0.015625, mean 8.0108642578125e-05 [1] max 0.00048828125, mean 5.692243576049805e-06 load_dv: [0] max 3.921875, mean 0.052978515625 [1] max 0.1904296875, mean 0.0120849609375 dv diff: [0] max 0.015625, mean 8.153915405273438e-05 [1] max 0.00048828125, mean 6.938353180885315e-08

#######################################################

test_varlen_qkvpackaged_func

############################## forward: ############################## out: max 3.296875, mean 0.057373046875 out diff: [0] max 0.0, mean 0.0 [1] max 0.00390625, mean 6.961822509765625e-05 lse: max 5.656599521636963, mean 4.309932708740234 lse diff: [0] max 0.0, mean 0.0 [1] max 4.76837158203125e-07, mean 1.7325083945252118e-07 lse: max 7.727584362030029, mean 6.5278754234313965 lse diff: [0] max 0.0, mean 0.0 [1] max 9.5367431640625e-07, mean 1.954694113237565e-07 lse: max 8.730499267578125, mean 7.501138687133789 lse diff: [0] max 0.0, mean 0.0 [1] max 1.9073486328125e-06, mean 2.5631595690356335e-07 ############################## backward: ############################## load_dq: [0] max 3.0625, mean 0.07177734375 [1] max 1.0859375, mean 0.035400390625 dq diff: [0] max 0.00048828125, mean 2.051820047199726e-09 [1] max 0.00390625, mean 6.389617919921875e-05 load_dk: [0] max 3.484375, mean 0.0693359375 [1] max 1.0390625, mean 0.0169677734375 dk diff: [0] max 0.015625, mean 0.00011157989501953125 [1] max 0.00390625, mean 1.0073184967041016e-05 load_dv: [0] max 5.9375, mean 0.07373046875 [1] max 0.94921875, mean 0.0172119140625 dv diff: [0] max 0.03125, mean 0.00011348724365234375 [1] max 0.00048828125, mean 6.379559636116028e-08