RasaHQ / carbon-bot

Apache License 2.0
42 stars 31 forks source link

Fix CI bugs and update unsupported config option #45

Closed samsucik closed 2 years ago

samsucik commented 2 years ago

Previously, the CI was running on training stories because it couldn't find the test story file. This is now fixed.

Additionally, CI didn't report the NLU cross-validation results when a new PR was opened -- one had to add further commits in order to get the report posted as a comment on the PR. This is now fixed as well.

Also, DIET was using a config option that's no longer supported (weight_sparsity), I've changed this to use connection_density.

github-actions[bot] commented 2 years ago

Intent Cross-Validation Results (5 folds)

class support f1-score confused_with
macro avg 2517 0.7380734636619108 N/A
weighted avg 2517 0.7971686356558857 N/A
faq 643 0.7582501918649271 inquire-ask_clarification-offsets(23), estimate_emissions(20)
inform 616 0.9454841334418227 faq(23), estimate_emissions(3)
affirm 255 0.8808080808080808 faq(10), inform(6)
inquire-ask_clarification-offsets 124 0.6935483870967742 faq(21), estimate_emissions(5)
estimate_emissions 73 0.6134969325153374 faq(9), inquire-ask_clarification-offsets(7)
deny 69 0.7857142857142857 faq(6), buy_offsets(2)
insult 63 0.6250000000000001 faq(11), vulgar(4)
greet 63 0.8943089430894310 faq(4), thank(2)
why 59 0.7118644067796610 faq(8), inquire-ask_clarification(5)
inform_notunderstanding 58 0.5123966942148760 faq(15), affirm(3)
farewell 57 0.7999999999999999 faq(5), insult(4)
thank 54 0.8771929824561403 insult(2), inform_notunderstanding(1)
express_positive-emo 48 0.7741935483870969 affirm(3), faq(2)
vulgar 46 0.6265060240963856 faq(10), insult(10)
express_surprise 43 0.6506024096385542 faq(7), estimate_emissions(2)
express_uncertainty 43 0.6493506493506493 faq(7), affirm(3)
inquire-ask_clarification 38 0.5569620253164557 faq(9), why(3)
buy_offsets 35 0.7105263157894737 faq(3), affirm(2)
how_calculated 29 0.7450980392156864 faq(5), estimate_emissions(3)
deny_flying 28 0.6808510638297872 faq(6), inform_notunderstanding(3)
express_negative-emo 25 0.6956521739130435 inform_notunderstanding(3), affirm(1)
restart 18 0.9230769230769230 N/A
meta_inform_problem_bad-link 12 0.9600000000000001 N/A
SCENARIO 10 0.6666666666666665 faq(1), why(1)
help 8 0.7142857142857143 faq(3)

Entity Cross-Validation Results (5 folds)

entity support f1-score precision recall
micro avg 926 0.79717237629146290 0.8028477546549836 0.7915766738660908
macro avg 926 0.69649595855245200 0.7933685576580973 0.6566126087353127
weighted avg 926 0.79255564242657520 0.7995220363346290 0.7915766738660908
city 384 0.86043533930857890 0.8463476070528967 0.8750000000000000
city.to 182 0.75135135135135130 0.7393617021276596 0.7637362637362637
city.from 149 0.69039145907473300 0.7348484848484849 0.6510067114093959
travel_flight_class 95 0.93814432989690720 0.9191919191919192 0.9578947368421052
iata 76 0.69387755102040820 0.7183098591549296 0.6710526315789473
iata.to 19 0.70270270270270270 0.7222222222222222 0.6842105263157895
iata.from 16 0.36363636363636365 0.6666666666666666 0.2500000000000000
number 5 0.57142857142857150 1.0000000000000000 0.4000000000000000
samsucik commented 2 years ago

@kedz once the CI has finished and you're reviewing this PR: You can compare the fresh NLU cross-validation numbers to:

When I compared the previous run on this PR to the one from June 3 (linked above), I didn't observe concerning regressions -- most F1s went up, a few dropped by ~0.03, and only two went down by much more (0.07, 0.09): the insult and vulgar intents. Overall, the aggregated F1s went up. If the new CI run produces similar changes in F1s, I think we can merge this into the main branch without big concerns.

github-actions[bot] commented 2 years ago

Intent Cross-Validation Results (5 folds)

class support f1-score confused_with
macro avg 2517 0.7260712310393109 N/A
weighted avg 2517 0.8010170814324005 N/A
faq 643 0.7779456193353473 inform(16), estimate_emissions(16)
inform 616 0.9392446633825944 faq(20), estimate_emissions(7)
affirm 255 0.8599605522682445 inform(7), faq(7)
inquire-ask_clarification-offsets 124 0.7394957983193277 faq(28), estimate_emissions(3)
estimate_emissions 73 0.5844155844155845 faq(12), inquire-ask_clarification-offsets(5)
deny 69 0.7482014388489210 buy_offsets(5), faq(5)
greet 63 0.8943089430894310 faq(4), insult(2)
insult 63 0.6764705882352942 faq(8), vulgar(3)
why 59 0.7058823529411764 faq(5), inquire-ask_clarification(2)
inform_notunderstanding 58 0.5585585585585585 faq(11), why(4)
farewell 57 0.8301886792452831 faq(4), deny(3)
thank 54 0.8928571428571429 faq(1), affirm(1)
express_positive-emo 48 0.7058823529411765 affirm(4), faq(3)
vulgar 46 0.7407407407407407 faq(9), insult(5)
express_surprise 43 0.6511627906976745 faq(10), affirm(1)
express_uncertainty 43 0.7246376811594203 deny(4), faq(4)
inquire-ask_clarification 38 0.4687500000000000 faq(14), why(2)
buy_offsets 35 0.6666666666666666 faq(8), inquire-ask_clarification-offsets(2)
how_calculated 29 0.7619047619047619 faq(2), inquire-ask_clarification-offsets(1)
deny_flying 28 0.6666666666666666 faq(5), deny(2)
express_negative-emo 25 0.6521739130434783 inform_notunderstanding(2), why(1)
restart 18 0.9729729729729730 N/A
meta_inform_problem_bad-link 12 1.0000000000000000 N/A
SCENARIO 10 0.6250000000000000 thank(2), inquire-ask_clarification-offsets(1)
help 8 0.3076923076923077 faq(4), why(1)

Entity Cross-Validation Results (5 folds)

entity support f1-score precision recall
micro avg 926 0.79360000000000000 0.7839831401475237 0.8034557235421166
macro avg 926 0.65065308450789790 0.6440655595213791 0.6666966700133950
weighted avg 926 0.79331163985922330 0.7862977658768394 0.8034557235421166
city 384 0.85570890840652450 0.8256658595641646 0.8880208333333334
city.to 182 0.79683377308707120 0.7664974619289340 0.8296703296703297
city.from 149 0.73170731707317070 0.7608695652173914 0.7046979865771812
travel_flight_class 95 0.89130434782608700 0.9213483146067416 0.8631578947368421
iata 76 0.62857142857142860 0.6875000000000000 0.5789473684210527
iata.to 19 0.61538461538461540 0.6000000000000000 0.6315789473684210
iata.from 16 0.39999999999999997 0.3684210526315789 0.4375000000000000
number 5 0.28571428571428570 0.2222222222222222 0.4000000000000000