Closed samsucik closed 2 years ago
class | support | f1-score | confused_with |
---|---|---|---|
macro avg | 2517 | 0.7380734636619108 | N/A |
weighted avg | 2517 | 0.7971686356558857 | N/A |
faq | 643 | 0.7582501918649271 | inquire-ask_clarification-offsets(23), estimate_emissions(20) |
inform | 616 | 0.9454841334418227 | faq(23), estimate_emissions(3) |
affirm | 255 | 0.8808080808080808 | faq(10), inform(6) |
inquire-ask_clarification-offsets | 124 | 0.6935483870967742 | faq(21), estimate_emissions(5) |
estimate_emissions | 73 | 0.6134969325153374 | faq(9), inquire-ask_clarification-offsets(7) |
deny | 69 | 0.7857142857142857 | faq(6), buy_offsets(2) |
insult | 63 | 0.6250000000000001 | faq(11), vulgar(4) |
greet | 63 | 0.8943089430894310 | faq(4), thank(2) |
why | 59 | 0.7118644067796610 | faq(8), inquire-ask_clarification(5) |
inform_notunderstanding | 58 | 0.5123966942148760 | faq(15), affirm(3) |
farewell | 57 | 0.7999999999999999 | faq(5), insult(4) |
thank | 54 | 0.8771929824561403 | insult(2), inform_notunderstanding(1) |
express_positive-emo | 48 | 0.7741935483870969 | affirm(3), faq(2) |
vulgar | 46 | 0.6265060240963856 | faq(10), insult(10) |
express_surprise | 43 | 0.6506024096385542 | faq(7), estimate_emissions(2) |
express_uncertainty | 43 | 0.6493506493506493 | faq(7), affirm(3) |
inquire-ask_clarification | 38 | 0.5569620253164557 | faq(9), why(3) |
buy_offsets | 35 | 0.7105263157894737 | faq(3), affirm(2) |
how_calculated | 29 | 0.7450980392156864 | faq(5), estimate_emissions(3) |
deny_flying | 28 | 0.6808510638297872 | faq(6), inform_notunderstanding(3) |
express_negative-emo | 25 | 0.6956521739130435 | inform_notunderstanding(3), affirm(1) |
restart | 18 | 0.9230769230769230 | N/A |
meta_inform_problem_bad-link | 12 | 0.9600000000000001 | N/A |
SCENARIO | 10 | 0.6666666666666665 | faq(1), why(1) |
help | 8 | 0.7142857142857143 | faq(3) |
entity | support | f1-score | precision | recall |
---|---|---|---|---|
micro avg | 926 | 0.79717237629146290 | 0.8028477546549836 | 0.7915766738660908 |
macro avg | 926 | 0.69649595855245200 | 0.7933685576580973 | 0.6566126087353127 |
weighted avg | 926 | 0.79255564242657520 | 0.7995220363346290 | 0.7915766738660908 |
city | 384 | 0.86043533930857890 | 0.8463476070528967 | 0.8750000000000000 |
city.to | 182 | 0.75135135135135130 | 0.7393617021276596 | 0.7637362637362637 |
city.from | 149 | 0.69039145907473300 | 0.7348484848484849 | 0.6510067114093959 |
travel_flight_class | 95 | 0.93814432989690720 | 0.9191919191919192 | 0.9578947368421052 |
iata | 76 | 0.69387755102040820 | 0.7183098591549296 | 0.6710526315789473 |
iata.to | 19 | 0.70270270270270270 | 0.7222222222222222 | 0.6842105263157895 |
iata.from | 16 | 0.36363636363636365 | 0.6666666666666666 | 0.2500000000000000 |
number | 5 | 0.57142857142857150 | 1.0000000000000000 | 0.4000000000000000 |
@kedz once the CI has finished and you're reviewing this PR: You can compare the fresh NLU cross-validation numbers to:
When I compared the previous run on this PR to the one from June 3 (linked above), I didn't observe concerning regressions -- most F1s went up, a few dropped by ~0.03, and only two went down by much more (0.07, 0.09): the insult
and vulgar
intents. Overall, the aggregated F1s went up. If the new CI run produces similar changes in F1s, I think we can merge this into the main branch without big concerns.
class | support | f1-score | confused_with |
---|---|---|---|
macro avg | 2517 | 0.7260712310393109 | N/A |
weighted avg | 2517 | 0.8010170814324005 | N/A |
faq | 643 | 0.7779456193353473 | inform(16), estimate_emissions(16) |
inform | 616 | 0.9392446633825944 | faq(20), estimate_emissions(7) |
affirm | 255 | 0.8599605522682445 | inform(7), faq(7) |
inquire-ask_clarification-offsets | 124 | 0.7394957983193277 | faq(28), estimate_emissions(3) |
estimate_emissions | 73 | 0.5844155844155845 | faq(12), inquire-ask_clarification-offsets(5) |
deny | 69 | 0.7482014388489210 | buy_offsets(5), faq(5) |
greet | 63 | 0.8943089430894310 | faq(4), insult(2) |
insult | 63 | 0.6764705882352942 | faq(8), vulgar(3) |
why | 59 | 0.7058823529411764 | faq(5), inquire-ask_clarification(2) |
inform_notunderstanding | 58 | 0.5585585585585585 | faq(11), why(4) |
farewell | 57 | 0.8301886792452831 | faq(4), deny(3) |
thank | 54 | 0.8928571428571429 | faq(1), affirm(1) |
express_positive-emo | 48 | 0.7058823529411765 | affirm(4), faq(3) |
vulgar | 46 | 0.7407407407407407 | faq(9), insult(5) |
express_surprise | 43 | 0.6511627906976745 | faq(10), affirm(1) |
express_uncertainty | 43 | 0.7246376811594203 | deny(4), faq(4) |
inquire-ask_clarification | 38 | 0.4687500000000000 | faq(14), why(2) |
buy_offsets | 35 | 0.6666666666666666 | faq(8), inquire-ask_clarification-offsets(2) |
how_calculated | 29 | 0.7619047619047619 | faq(2), inquire-ask_clarification-offsets(1) |
deny_flying | 28 | 0.6666666666666666 | faq(5), deny(2) |
express_negative-emo | 25 | 0.6521739130434783 | inform_notunderstanding(2), why(1) |
restart | 18 | 0.9729729729729730 | N/A |
meta_inform_problem_bad-link | 12 | 1.0000000000000000 | N/A |
SCENARIO | 10 | 0.6250000000000000 | thank(2), inquire-ask_clarification-offsets(1) |
help | 8 | 0.3076923076923077 | faq(4), why(1) |
entity | support | f1-score | precision | recall |
---|---|---|---|---|
micro avg | 926 | 0.79360000000000000 | 0.7839831401475237 | 0.8034557235421166 |
macro avg | 926 | 0.65065308450789790 | 0.6440655595213791 | 0.6666966700133950 |
weighted avg | 926 | 0.79331163985922330 | 0.7862977658768394 | 0.8034557235421166 |
city | 384 | 0.85570890840652450 | 0.8256658595641646 | 0.8880208333333334 |
city.to | 182 | 0.79683377308707120 | 0.7664974619289340 | 0.8296703296703297 |
city.from | 149 | 0.73170731707317070 | 0.7608695652173914 | 0.7046979865771812 |
travel_flight_class | 95 | 0.89130434782608700 | 0.9213483146067416 | 0.8631578947368421 |
iata | 76 | 0.62857142857142860 | 0.6875000000000000 | 0.5789473684210527 |
iata.to | 19 | 0.61538461538461540 | 0.6000000000000000 | 0.6315789473684210 |
iata.from | 16 | 0.39999999999999997 | 0.3684210526315789 | 0.4375000000000000 |
number | 5 | 0.28571428571428570 | 0.2222222222222222 | 0.4000000000000000 |
Previously, the CI was running on training stories because it couldn't find the test story file. This is now fixed.
Additionally, CI didn't report the NLU cross-validation results when a new PR was opened -- one had to add further commits in order to get the report posted as a comment on the PR. This is now fixed as well.
Also, DIET was using a config option that's no longer supported (
weight_sparsity
), I've changed this to useconnection_density
.