Closed korenmiklos closed 3 years ago
Actually, let me give it a try.
Is there a guide somewhere to interpret markdown formulas? I have not found one so far, thus I am a bit unsure about the formula in the comment. Also I am a bit unsure about the notations, do I understand correctly that p
is share
and x
is sample_share
in this case? n
is still the number of balls I guess.
Oh, ok, thanks, then I will have a look at it, to better understand the formula.
@zaveczgergo Please save the data for this simulation in a .csv file.
partner_country,year,shipments1,shipments2,...,shipments97
RU,2017,250,120,...,1
Shipments should be integers. For each (d,o,t), first round up shipments, then sum across origin countries (in this order):
generate shipments = ceil(trade_volume / shipment_size)
collapse (sum) shipments, by(partner_country year product)
reshape wide shipments, i(partner_country year) j(product)
I can estimate the Dirichlet MN from this.
I did a bit of math with KLD and the multinomial. A better measure of distance between {p} (the base distribution) and {x} (the actual distribution) could be the log likelihood function. It is similar to KLD, but has several correction terms: $$ \log L = \ln n! -\sum_k\ln x_k! + \sum_k x_k\ln p_k $$ @zaveczgergo please compute the
logL
divided byn
in the simulated data and plot it agains log n to see if there is a size bias in this measure.