Closed sugarcrm-aorso closed 4 years ago
Sorry for the slow answer.
I think essentially what you want to do is to compute the probability of conversion _conditional on conversioning not happening at any t<t0
This should be doable by just computing something like
large_t = 1000 # some large enough number that it represents the "final" conversion rate
p = model.predict(x, t=t_0)
q = model.predict(x, t=LARGE_T)
return 1 - (1 - q) / (1 - p)
Note that p < q
and that p
converges from 0
towards q
as t_0
gets larger so the quantity 1 - (1 - q) / (1 - p)
will start at q
and drop towards 0
.
I have a visual proof in my head for why this works but it's a bit hard to share on Github. I think there's some elementary proof using probability theory, but I always struggle getting the notation right so I'll skip it at this point.
Hope this helps!
Thanks for the response Erik. Was able to get the result you mentioned by applying Bayes' Theorem.
I'm trying to do some modeling where I have a large time lag for conversion, and I am interested in getting updated single observation likelihood of conversion predictions over the lifetime of an observation (at no specified interval, just when someone is interested and wants to look). Intuitively I'd expect the likelihood of conversion to be the highest for the first couple of days/weeks and past a certain point it essentially isn't going to convert, it's just too old.
I was looking at Cox Proportional Hazards models when I came across Convoys and it seemed to address my problem more directly, though many of the examples involve groups and aggregate conversion rates. I know there are regression classes and I was playing with those:
but I was curious if I'm thinking about the interpretation of the output for real-time scoring correctly (i.e., an observation is to be scored at time t and the result is the likelihood of conversion at that point assuming the observation has not converted at this point). Similarly, if my features are time-dependent (e.g., may be null at creation, but I learn more about them over time), can that be factored in (after more thorough reading of the docs, I've seen this in future directions using RNN, do you have any papers you can point me at)?
Thank you in advance.