chrismbryant / amazon-confidence-interval

A browser extension which adds Bayesian visualizations to Amazon ratings.
MIT License
31 stars 4 forks source link

Feature/beta-viz #19

Closed chrismbryant closed 4 years ago

chrismbryant commented 4 years ago

This is a first attempt at producing beta distribution visualizations to go with each product review. I've removed the 95% confidence interval calculations from the display for now until we decide what user experience we want to have. To make the visualization, I ended up using D3, which really increased the size of the build (maybe there's a way to explicitly import just the stuff I use like with the stdlib imports?). Also, right now, the SVGs that hold the visualization are statically sized, but we probably would want them to dynamically resize with the div that contains them.

aeciorc commented 4 years ago

This looks nice. I don't know enough about data science to properly review this, so I just left a suggestion for the bundle size issue.

chrismbryant commented 4 years ago

@aeciorc Thanks for the suggestion; I tried doing that, importing just what I needed and putting some color maps into an object, but it doesn't seem to have changed the bundle size (still about 4.5 Mb)...not sure if I'm missing something

aeciorc commented 4 years ago

@aeciorc Thanks for the suggestion; I tried doing that, importing just what I needed and putting some color maps into an object, but it doesn't seem to have changed the bundle size (still about 4.5 Mb)...not sure if I'm missing something

Ok, will investigate this tomorrow

musicin3d commented 4 years ago

On first pass...

Having never taken a course on statistics or numerical analysis, I don't know what to do with the graph. But if it helps you do your thing then coolio. Where can I learn about it?

I would suggest that we use let instead of var. It is a newer language feature that has more straightforward scoping and initialization. It was introduced at the same time as const.

let allows you to declare variables that are limited to a scope of a block statement, or expression on which it is used, unlike the var keyword, which defines a variable globally, or locally to an entire function regardless of block scope. The other difference between var and let is that the latter is initialized to a value only when a parser evaluates it (see below).

Just like const the let does not create properties of the window object when declared globally (in the top-most scope).

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/let

aeciorc commented 4 years ago

@aeciorc Thanks for the suggestion; I tried doing that, importing just what I needed and putting some color maps into an object, but it doesn't seem to have changed the bundle size (still about 4.5 Mb)...not sure if I'm missing something

Ok, will investigate this tomorrow

@chrismbryant Check my suggestions. It should decrease the size of inject.js to 1.77 Mb

musicin3d commented 4 years ago

Disclaimer: I'm a loud mouth in code review. I do try to present things open handedly and open mindedly. But I figure it's better to risk a little discomfort than to wish you'd spoken up later on. Feel free to tell me to can it if I overstep. :laughing:

chrismbryant commented 4 years ago

@aeciorc Thanks for the suggestions!

@musicin3d Thanks, I changed all vars to lets. And no worries, please continue giving feedback whenever something can be done better! Regarding the visualizations, I'm using the number of positive and negative ratings to get a probability distribution of the underlying "positive experience rate". This is what @3b1b started to get at in his first video, and what I assume he'll address more in the 2nd and especially 3rd video of his "probabilities of probabilities" series. The blog post by John Cook he mentioned is also a good start for understanding what's going on.

Originally, I was only planning on testing out the visualization capabilities by plotting a number line with a rectangle encompassing the 95% confidence interval, but I realized that I might as well go all the way and show the true distribution where that confidence interval came from (the "95%" there means that 95% of the area inside the distribution is contained within that interval, with 2.5% hanging out to the left, and 2.5% hanging out to the right). The more data we have, the skinnier that distribution is, and the higher the positive experience rate is, the farther to the right its peak is pushed. So the best product has a skinny distribution pushed all the way to the right. My D3 visualizations show the distribution from 0% to 100% with a line at the 50% mark and dots at every 10% increment in between.

musicin3d commented 4 years ago

I like this naming convention for feature branches. I'll start using it on the next PR.

aeciorc commented 4 years ago

I feel like you’re trying to optimize this a little too early. I would do away with the ratingData object and have a method for getting the beta using average and another for dist, both taking only the params they use. Then getaBetaParams would take the proportion itself rather than the method --Edit I replied this via email and didn't realize it would show up here, I meant to rely your latest comment

On Apr 1, 2020, at 11:13 PM, Christopher Bryant notifications@github.com wrote:

 @chrismbryant commented on this pull request.

In src/shared/calculations.js:

  *
    • In this function, we use the rating data to compute the "alpha" and "beta"
    • parameters for the beta distribution described above.
  • *
    • @param {Object.<string, (number|number[])>} ratingData - object containing
    • the average rating under key "avg" and/or the 5-element rating
    • distribution (fraction of ratings at each star value) under key "dist".
    • @param {number} numRatings - total number of ratings for this product.
    • @param {string} method - which method to use to compute proportion of
    • reviews which were positive; must be either "avg" or "dist".
    • @returns {Object.<string, number>} betaParams - object containing "alpha"
    • and "beta" parameters for the beta distribution, as well as the
    • estimated "proportion" and number "numPositive" of ratings which were
    • positve, and the total number of ratings "numRatings".
  • */
  • getBetaParams(ratingData, numRatings, method = "avg") { I get what you're saying; it does feel redundant, but I'm not exactly sure how to fix it. The idea was that ratingData would be a dictionary of all the possible forms of rating data you care about, like:

ratingData = { "dist": [0.1, 0, 0, 0.4, 0.5], "avg": 4.7 } So you can just specify that once and decide later whether you want to use "avg" or "dist" to generate your statistics. Now that I say that though, if that's what I'm going for, it seems like I might also want to include numRatings inside of that dictionary...

Not sure where to go from here. Thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

musicin3d commented 4 years ago

I replied this via email and didn't realize it would show up here, I meant to rely your latest comment

This one? https://github.com/chrismbryant/amazon-confidence-interval/pull/19#discussion_r402028060

musicin3d commented 4 years ago

Random feedback on the feature in general: I unwittingly had this active while shopping for computer parts. It was very useful. :)

aeciorc commented 4 years ago

I replied this via email and didn't realize it would show up here, I meant to rely your latest comment

This one? #19 (comment)

Yes, that one

chrismbryant commented 4 years ago

@aeciorc and @musicin3d , thanks for the feedback. The suggestion to make the functions take proportion as an argument made things clearer for me. Does the code look better now?

musicin3d commented 4 years ago

Thoughts on whether we should delete branches after merge?

chrismbryant commented 4 years ago

@musicin3d At the moment, I don't have strong feelings either way. I think it can be up to the person who made the branch. If the branch is a self-contained feature that you don't plan on re-visiting, then delete the branch after merge. But if the branch represents a larger feature that is being done in multiple steps, then leave it open.