Does your code work for long term result?

Hi, I am new to use multi-arm bandit for my data. Recently, we did an A/B testing on some data to see how people convert if we contact them (group A) and don't contact them (Group B). Now we are thinking to do a new test but not A/B testing but using multi arm-bandit. The problem is that If we call a person after 6 month we can find out how students converted. Multi-arm-bandit works for long term as well since we don't know yet the results? I am using data including treatment (call or not) and outcome (converted or not). I want to do better way than A/B testing to find better choice with time but since our results will come after 6 month, I am not sure how to use that. We were thinking for active-learning, which using A/B testing data, train our models separately for Model A and model B and then the subtraction between conversion probability of ModelA and ModelB for each student gives us uplift. Then now we want to do a new A/B testing for those students that their difference probability between Model A and Model B was low. But to do that we should wait for 6 month to see how students react to call. Is there a way to use multi arm bandit way for such long term obtained results? Is it possible to use your code? If yes how can I do it?

Many thanks in advance,

LaunchpadAI / space-bandits

Does your code work for long term result? #16