cppisking / ffrk-inspector

Issues tracker for FFRK Inspector
10 stars 8 forks source link

Database comments / documentation on columns #55

Closed krissco closed 9 years ago

krissco commented 9 years ago

Thanks for your hard work! I would not consider installing this without the source code, however since you provide database credentials I am interested in writing adhoc queries or reports based on the item drop data.

I was not certain as to the meaning of the histo* columns. Everything else is self-evident. Is this how you would get the # of drops per battle per item? I had originally tried a simple count(*) but that didn't pass my sanity check (the least # of records per item/battle in per_battle_drops was 2).

select   x.world,
         x.dungeon,
         x.difficulty,
         x.battle,
         x.stamina,
         x.samples,
         x.item,
         s.name as series,
         x.rarity,
         x.drops         
from    (select   w.name as world,
                  d.name as dungeon,
                  d.difficulty,
                  b.name as battle,
                  b.stamina,
                  b.samples,
                  i.name as item,
                  i.rarity,
                  i.series,
                  sum(pbd.histo_value) as drops
         from     ffrki.worlds w,
                  ffrki.dungeons d,
                  ffrki.battles b,
                  ffrki.per_battle_drops pbd,         
                  ffrki.items i
         where    w.id       = d.world
         and      d.id       = b.dungeon 
         and      b.id       = pbd.battleid
         and      pbd.itemid = i.id
         group by w.name,
                  d.name,
                  d.difficulty,
                  b.name,
                  b.stamina,
                  b.samples,
                  i.name,
                  i.series,
                  i.rarity) x
        left join ffrki.series s 
          on x.series = s.id
order by drops desc

PS - I almost never work in MySQL (Oracle only) so forgive my funky join syntax.

krissco commented 9 years ago

Also, if I understand it correctly, the probability of an item drop would then be determined by drops / samples correct?

cppisking commented 9 years ago

It used to be the case that battles.samples and battles.histo_samples could contain different values. In the early days of the alpha I only had the samples column. Then I would add +1 to the samples when you finished a battle, and +n to each item's drop count. And the row with the drop count referenced the battles table so you could figure out the number of samples. By the way, this is all now simplified by using the dungeon_drops view, so you should get a bunch of rows with everything you need. So that big join you have, instead just select from dungeon_drops and it hides the join for you.

In any case, I wanted to calculate more than just a simple average. Like standard deviation. So that's when I added the histo_samples column. It really just means "number of samples that have occurred since introducing the change to record full histograms". So now, ever since I started recording drop samples in histogram format, both samples and histo_samples always increase by 1 when I finish a battle. In other words, samples is a legacy column.

For a while it went on where the two columns could contain different values, but eventually I found a problem that invalidated some of the data because I had combined some of the samples from the old format (pre-histogram) with samples from the new format (with histogram) in a way that wasn't statistically sound, and so it forced me to do a partial rollback of all data that was affected by this inconsistency (which wasn't everything, luckily).

After that "purge", samples and histo_samples should always contain identical values, and the only data that remains is data that has only ever been recorded using histogram format. So you can use either samples or histo_samples. Eventually I will delete the histo_samples column and just use samples.

Now, onto the actual drop counts. Think of what a histogram looks like. You've got bars spread along the x-axis, and each one has a different height. To represent drop statistics this way, you would have one bar for each unique drop count witnessed, and the height of the bar would be the number of times you witnessed that many drops. These correspond to the histo_bucket and histo_value columns, respectively. In other words, a histo_bucket of 4 means that the row you're looking at contains information about the number of times someone obtained exactly 4 copies of that item in a single run of the dungeon. And the histo_value column is the number of times it happens. A histo_bucket of -1 is a special value which corresponds to the "global" bucket, which means the total number of that item that has ever been received, all aggregated together.

If all you want is to compute average drops / run, you only need to look at the row where histo_bucket = -1 and you can just compute histo_value / samples. But if you want to compute a standard deviation, you will need the other rows. Since, for the other rows, histo_bucket is the number of drops on a single run and histo_value is the number of times that event occurred, you can compute the total number of drops received for that combination of (item, battle) by using n = SUMPRODUCT(histo_bucket, histo_value) for all rows where histo_bucket != -1. In fact, it should be an invariant for each combination of (itemid, battleid), that n = histo_value where histo_bucket=-1. Note that if you want to compute the number of times 0 of an item dropped, then this is samples - SUM(histo_value where histo_bucket != -1)

Regarding your other comment about source code, I totally understand your position and it's good to be cautious. I'm not sure if it will sway your opinion or not (and if it doesn't it's equally fine), but here's a screenshot I just took using .NET Reflector, which you can download for free, which shows the source code of the method that intercepts proxy requests. capture

As you can see, if the response is not from ffrk.denagames.com, it is discarded. As far as I'm concerned, the ability to use .NET Reflector is as good as having source, and arguably even better than having source. Even if I publish source, you still have to trust that the binary I release is built against the same version of source that I make available. .NET Reflector removes any doubt from the equation. It even has an option to Export Source code, so it's better than the real thing IMO.

krissco commented 9 years ago

Thanks for the detailed response.