RasaHQ / carbon-bot

Apache License 2.0
42 stars 31 forks source link

Redeploy carbon bot on Facebook #50

Closed samsucik closed 2 years ago

samsucik commented 2 years ago

I've added the 3 secrets that are needed to connect the bot to Facebook Messenger. Usually, the creds would go into credentials.yml as described here in the docs. Here, though, they're instead saved as Action secrets on this repo (as a repo admin, I can do this) and then plugged into the Helm chart.

samsucik commented 2 years ago

@kedz some more context is here.

github-actions[bot] commented 2 years ago

Intent Cross-Validation Results (5 folds)

class support f1-score confused_with
macro avg 2517 0.72512739671103490 N/A
weighted avg 2517 0.80362214355672990 N/A
faq 643 0.78398791540785500 inform(22), inquire-ask_clarification-offsets(16)
inform 616 0.94060211554109040 faq(15), deny_flying(5)
affirm 255 0.86821705426356580 faq(7), estimate_emissions(5)
inquire-ask_clarification-offsets 124 0.70689655172413780 faq(30), why(3)
estimate_emissions 73 0.60759493670886080 faq(8), affirm(4)
deny 69 0.72131147540983610 faq(8), affirm(5)
insult 63 0.72058823529411780 faq(11), inform(1)
greet 63 0.87022900763358780 faq(4), inform_notunderstanding(1)
why 59 0.69421487603305790 faq(6), inquire-ask_clarification(5)
inform_notunderstanding 58 0.54901960784313730 faq(11), affirm(4)
farewell 57 0.83018867924528310 faq(5), insult(2)
thank 54 0.92592592592592590 greet(1), faq(1)
express_positive-emo 48 0.76000000000000000 SCENARIO(3), affirm(3)
vulgar 46 0.65753424657534230 faq(10), insult(10)
express_surprise 43 0.74418604651162780 faq(6), estimate_emissions(2)
express_uncertainty 43 0.73170731707317080 faq(6), greet(1)
inquire-ask_clarification 38 0.49315068493150680 faq(11), why(4)
buy_offsets 35 0.67532467532467530 faq(4), affirm(3)
how_calculated 29 0.80769230769230760 faq(5), estimate_emissions(3)
deny_flying 28 0.66666666666666650 faq(5), estimate_emissions(1)
express_negative-emo 25 0.62222222222222220 insult(2), inform_notunderstanding(2)
restart 18 0.88235294117647060 faq(2), affirm(1)
meta_inform_problem_bad-link 12 0.87999999999999990 faq(1)
SCENARIO 10 0.56000000000000000 faq(1), express_positive-emo(1)
help 8 0.42857142857142855 faq(3), inquire-ask_clarification(2)

Entity Cross-Validation Results (5 folds)

entity support f1-score precision recall
micro avg 926 0.8224500809498112 0.8220064724919094 0.8228941684665226
macro avg 926 0.7300728364723079 0.8033952924949558 0.7026989067354110
weighted avg 926 0.8210772453561229 0.8233933888922362 0.8228941684665226
city 384 0.8725361366622865 0.8806366047745358 0.8645833333333334
city.to 182 0.7903225806451614 0.7736842105263158 0.8076923076923077
city.from 149 0.7752442996742670 0.7531645569620253 0.7986577181208053
travel_flight_class 95 0.9285714285714285 0.9009900990099010 0.9578947368421052
iata 76 0.7083333333333334 0.7500000000000000 0.6710526315789473
iata.to 19 0.6341463414634148 0.5909090909090909 0.6842105263157895
iata.from 16 0.5600000000000000 0.7777777777777778 0.4375000000000000
number 5 0.5714285714285715 1.0000000000000000 0.4000000000000000
samsucik commented 2 years ago

@kedz thank you 🙂 I totally get that this is outside your usual scope. That's why I'm requesting reviews from you and Thomas -- this is a low-stakes situation (we can't break some important live deployment, so a bulletproof review from an expert isn't 100% needed) and I think it's useful for you to at least somewhat know what changes I'm making, so that the things I'm learning around CI/CD and FB integration don't stay completely within my own personal silo.

As for testing these particular changes before deploying them: I'm sure there would be a way but there's no obvious one. I guess if the bot was a high-stakes one, we'd set up different environments for staging deployments vs production deployments. But in order to test the connection to FB you'd sooner or later need to deploy the thing and just see if it works. I guess a bulletproof approach would then be to have two FB bots set up, one connected to the production deployment and one used for testing the staging deployments 🤔