samsucik commented 2 years ago

Previously, the CI was running on training stories because it couldn't find the test story file. This is now fixed.

Additionally, CI didn't report the NLU cross-validation results when a new PR was opened -- one had to add further commits in order to get the report posted as a comment on the PR. This is now fixed as well.

Also, DIET was using a config option that's no longer supported (weight_sparsity), I've changed this to use connection_density.

github-actions[bot] commented 2 years ago

Intent Cross-Validation Results (5 folds)

class	support	f1-score	confused_with
macro avg	2517	0.7380734636619108	N/A
weighted avg	2517	0.7971686356558857	N/A
faq	643	0.7582501918649271	inquire-ask_clarification-offsets(23), estimate_emissions(20)
inform	616	0.9454841334418227	faq(23), estimate_emissions(3)
affirm	255	0.8808080808080808	faq(10), inform(6)
inquire-ask_clarification-offsets	124	0.6935483870967742	faq(21), estimate_emissions(5)
estimate_emissions	73	0.6134969325153374	faq(9), inquire-ask_clarification-offsets(7)
deny	69	0.7857142857142857	faq(6), buy_offsets(2)
insult	63	0.6250000000000001	faq(11), vulgar(4)
greet	63	0.8943089430894310	faq(4), thank(2)
why	59	0.7118644067796610	faq(8), inquire-ask_clarification(5)
inform_notunderstanding	58	0.5123966942148760	faq(15), affirm(3)
farewell	57	0.7999999999999999	faq(5), insult(4)
thank	54	0.8771929824561403	insult(2), inform_notunderstanding(1)
express_positive-emo	48	0.7741935483870969	affirm(3), faq(2)
vulgar	46	0.6265060240963856	faq(10), insult(10)
express_surprise	43	0.6506024096385542	faq(7), estimate_emissions(2)
express_uncertainty	43	0.6493506493506493	faq(7), affirm(3)
inquire-ask_clarification	38	0.5569620253164557	faq(9), why(3)
buy_offsets	35	0.7105263157894737	faq(3), affirm(2)
how_calculated	29	0.7450980392156864	faq(5), estimate_emissions(3)
deny_flying	28	0.6808510638297872	faq(6), inform_notunderstanding(3)
express_negative-emo	25	0.6956521739130435	inform_notunderstanding(3), affirm(1)
restart	18	0.9230769230769230	N/A
meta_inform_problem_bad-link	12	0.9600000000000001	N/A
SCENARIO	10	0.6666666666666665	faq(1), why(1)
help	8	0.7142857142857143	faq(3)

Entity Cross-Validation Results (5 folds)

entity	support	f1-score	precision	recall
micro avg	926	0.79717237629146290	0.8028477546549836	0.7915766738660908
macro avg	926	0.69649595855245200	0.7933685576580973	0.6566126087353127
weighted avg	926	0.79255564242657520	0.7995220363346290	0.7915766738660908
city	384	0.86043533930857890	0.8463476070528967	0.8750000000000000
city.to	182	0.75135135135135130	0.7393617021276596	0.7637362637362637
city.from	149	0.69039145907473300	0.7348484848484849	0.6510067114093959
travel_flight_class	95	0.93814432989690720	0.9191919191919192	0.9578947368421052
iata	76	0.69387755102040820	0.7183098591549296	0.6710526315789473
iata.to	19	0.70270270270270270	0.7222222222222222	0.6842105263157895
iata.from	16	0.36363636363636365	0.6666666666666666	0.2500000000000000
number	5	0.57142857142857150	1.0000000000000000	0.4000000000000000

samsucik commented 2 years ago

@kedz once the CI has finished and you're reviewing this PR: You can compare the fresh NLU cross-validation numbers to:

the ones from the previous run on this PR (nothing has changed since then that could affect the F1 numbers, so any differences will be due to random variation)
the latest available report from the main branch (well, rather, from a PR that got merged into the main branch) which is here

When I compared the previous run on this PR to the one from June 3 (linked above), I didn't observe concerning regressions -- most F1s went up, a few dropped by ~0.03, and only two went down by much more (0.07, 0.09): the insult and vulgar intents. Overall, the aggregated F1s went up. If the new CI run produces similar changes in F1s, I think we can merge this into the main branch without big concerns.

github-actions[bot] commented 2 years ago

Intent Cross-Validation Results (5 folds)

class	support	f1-score	confused_with
macro avg	2517	0.7260712310393109	N/A
weighted avg	2517	0.8010170814324005	N/A
faq	643	0.7779456193353473	inform(16), estimate_emissions(16)
inform	616	0.9392446633825944	faq(20), estimate_emissions(7)
affirm	255	0.8599605522682445	inform(7), faq(7)
inquire-ask_clarification-offsets	124	0.7394957983193277	faq(28), estimate_emissions(3)
estimate_emissions	73	0.5844155844155845	faq(12), inquire-ask_clarification-offsets(5)
deny	69	0.7482014388489210	buy_offsets(5), faq(5)
greet	63	0.8943089430894310	faq(4), insult(2)
insult	63	0.6764705882352942	faq(8), vulgar(3)
why	59	0.7058823529411764	faq(5), inquire-ask_clarification(2)
inform_notunderstanding	58	0.5585585585585585	faq(11), why(4)
farewell	57	0.8301886792452831	faq(4), deny(3)
thank	54	0.8928571428571429	faq(1), affirm(1)
express_positive-emo	48	0.7058823529411765	affirm(4), faq(3)
vulgar	46	0.7407407407407407	faq(9), insult(5)
express_surprise	43	0.6511627906976745	faq(10), affirm(1)
express_uncertainty	43	0.7246376811594203	deny(4), faq(4)
inquire-ask_clarification	38	0.4687500000000000	faq(14), why(2)
buy_offsets	35	0.6666666666666666	faq(8), inquire-ask_clarification-offsets(2)
how_calculated	29	0.7619047619047619	faq(2), inquire-ask_clarification-offsets(1)
deny_flying	28	0.6666666666666666	faq(5), deny(2)
express_negative-emo	25	0.6521739130434783	inform_notunderstanding(2), why(1)
restart	18	0.9729729729729730	N/A
meta_inform_problem_bad-link	12	1.0000000000000000	N/A
SCENARIO	10	0.6250000000000000	thank(2), inquire-ask_clarification-offsets(1)
help	8	0.3076923076923077	faq(4), why(1)

Entity Cross-Validation Results (5 folds)

entity	support	f1-score	precision	recall
micro avg	926	0.79360000000000000	0.7839831401475237	0.8034557235421166
macro avg	926	0.65065308450789790	0.6440655595213791	0.6666966700133950
weighted avg	926	0.79331163985922330	0.7862977658768394	0.8034557235421166
city	384	0.85570890840652450	0.8256658595641646	0.8880208333333334
city.to	182	0.79683377308707120	0.7664974619289340	0.8296703296703297
city.from	149	0.73170731707317070	0.7608695652173914	0.7046979865771812
travel_flight_class	95	0.89130434782608700	0.9213483146067416	0.8631578947368421
iata	76	0.62857142857142860	0.6875000000000000	0.5789473684210527
iata.to	19	0.61538461538461540	0.6000000000000000	0.6315789473684210
iata.from	16	0.39999999999999997	0.3684210526315789	0.4375000000000000
number	5	0.28571428571428570	0.2222222222222222	0.4000000000000000

RasaHQ / carbon-bot

Fix CI bugs and update unsupported config option #45

Intent Cross-Validation Results (5 folds)

Entity Cross-Validation Results (5 folds)

Intent Cross-Validation Results (5 folds)

Entity Cross-Validation Results (5 folds)