Closed samsucik closed 2 years ago
class | support | f1-score | confused_with |
---|---|---|---|
macro avg | 2517 | 0.72512739671103490 | N/A |
weighted avg | 2517 | 0.80362214355672990 | N/A |
faq | 643 | 0.78398791540785500 | inform(22), inquire-ask_clarification-offsets(16) |
inform | 616 | 0.94060211554109040 | faq(15), deny_flying(5) |
affirm | 255 | 0.86821705426356580 | faq(7), estimate_emissions(5) |
inquire-ask_clarification-offsets | 124 | 0.70689655172413780 | faq(30), why(3) |
estimate_emissions | 73 | 0.60759493670886080 | faq(8), affirm(4) |
deny | 69 | 0.72131147540983610 | faq(8), affirm(5) |
insult | 63 | 0.72058823529411780 | faq(11), inform(1) |
greet | 63 | 0.87022900763358780 | faq(4), inform_notunderstanding(1) |
why | 59 | 0.69421487603305790 | faq(6), inquire-ask_clarification(5) |
inform_notunderstanding | 58 | 0.54901960784313730 | faq(11), affirm(4) |
farewell | 57 | 0.83018867924528310 | faq(5), insult(2) |
thank | 54 | 0.92592592592592590 | greet(1), faq(1) |
express_positive-emo | 48 | 0.76000000000000000 | SCENARIO(3), affirm(3) |
vulgar | 46 | 0.65753424657534230 | faq(10), insult(10) |
express_surprise | 43 | 0.74418604651162780 | faq(6), estimate_emissions(2) |
express_uncertainty | 43 | 0.73170731707317080 | faq(6), greet(1) |
inquire-ask_clarification | 38 | 0.49315068493150680 | faq(11), why(4) |
buy_offsets | 35 | 0.67532467532467530 | faq(4), affirm(3) |
how_calculated | 29 | 0.80769230769230760 | faq(5), estimate_emissions(3) |
deny_flying | 28 | 0.66666666666666650 | faq(5), estimate_emissions(1) |
express_negative-emo | 25 | 0.62222222222222220 | insult(2), inform_notunderstanding(2) |
restart | 18 | 0.88235294117647060 | faq(2), affirm(1) |
meta_inform_problem_bad-link | 12 | 0.87999999999999990 | faq(1) |
SCENARIO | 10 | 0.56000000000000000 | faq(1), express_positive-emo(1) |
help | 8 | 0.42857142857142855 | faq(3), inquire-ask_clarification(2) |
entity | support | f1-score | precision | recall |
---|---|---|---|---|
micro avg | 926 | 0.8224500809498112 | 0.8220064724919094 | 0.8228941684665226 |
macro avg | 926 | 0.7300728364723079 | 0.8033952924949558 | 0.7026989067354110 |
weighted avg | 926 | 0.8210772453561229 | 0.8233933888922362 | 0.8228941684665226 |
city | 384 | 0.8725361366622865 | 0.8806366047745358 | 0.8645833333333334 |
city.to | 182 | 0.7903225806451614 | 0.7736842105263158 | 0.8076923076923077 |
city.from | 149 | 0.7752442996742670 | 0.7531645569620253 | 0.7986577181208053 |
travel_flight_class | 95 | 0.9285714285714285 | 0.9009900990099010 | 0.9578947368421052 |
iata | 76 | 0.7083333333333334 | 0.7500000000000000 | 0.6710526315789473 |
iata.to | 19 | 0.6341463414634148 | 0.5909090909090909 | 0.6842105263157895 |
iata.from | 16 | 0.5600000000000000 | 0.7777777777777778 | 0.4375000000000000 |
number | 5 | 0.5714285714285715 | 1.0000000000000000 | 0.4000000000000000 |
@kedz thank you 🙂 I totally get that this is outside your usual scope. That's why I'm requesting reviews from you and Thomas -- this is a low-stakes situation (we can't break some important live deployment, so a bulletproof review from an expert isn't 100% needed) and I think it's useful for you to at least somewhat know what changes I'm making, so that the things I'm learning around CI/CD and FB integration don't stay completely within my own personal silo.
As for testing these particular changes before deploying them: I'm sure there would be a way but there's no obvious one. I guess if the bot was a high-stakes one, we'd set up different environments for staging deployments vs production deployments. But in order to test the connection to FB you'd sooner or later need to deploy the thing and just see if it works. I guess a bulletproof approach would then be to have two FB bots set up, one connected to the production deployment and one used for testing the staging deployments 🤔
I've added the 3 secrets that are needed to connect the bot to Facebook Messenger. Usually, the creds would go into
credentials.yml
as described here in the docs. Here, though, they're instead saved as Action secrets on this repo (as a repo admin, I can do this) and then plugged into the Helm chart.