drphilmarshall / SpaceWarps

Science Team Website Development and Analysis
MIT License
12 stars 18 forks source link

Conditional SWAP agent confusion matrices (P("LENS"|LENSED_QUASAR) etc) #162

Open drphilmarshall opened 9 years ago

drphilmarshall commented 9 years ago

The STRIDES group were discussing how to combine their "expert grades" for ~100 lensed quasar candidates this afternoon. I suggested that we had solved this problem with SWAP (although we did not apply it to our own expert grades!). On the STRIDES dime, then, I could:

In the first instance, I guess offline and unsupervised will work best (if only because we don't have any classifications of training images!) In future though, the team seemed amenable to having training images mixed in with the candidates, which I thought was interesting.

@cpadavis: it occurs to me that the above could potentially make a nice introductory example to the eSWAP analysis. Comments welcome!

anupreeta27 commented 9 years ago

@drphilmarshall
i believe you are referring to existing classifications of the qso candidates by some of the strides members. if you want to use SWAP then you will have to use a training sample for calibrating the strides team's classifications - this would mean asking everyone to redo all classifications on the test + a training sample. and, if you don't plan to use a training sample for everyone's PL-PD then why not simply take the average of their grades? how can swap provide a better solution over a simple average?

drphilmarshall commented 9 years ago

I'm talking about existing and future candidate grading, and we did talk about putting in sims and duds to the grading exercise.

SWAP can now operate without a training set (in "unsupervised" mode, since Taiwan last year) - Chris is testing it for the eSWAP paper. It ends up capturing consensus between agents - which still have independent confusion matrices, that provide sort of (but not literally) a weighted average. Plus you could imagine assigning different initial PD and PL for each agent (Paul Schechter seems to be being afforded significantly higher values, for example!)

On Wed, Apr 29, 2015 at 3:01 AM, anupreeta27 notifications@github.com wrote:

@drphilmarshall https://github.com/drphilmarshall

i believe you are referring to existing classifications of the qso candidates by some of the strides members. if you want to use SWAP then you will have to use a training sample for calibrating the strides team's classifications - this would mean asking everyone to redo all classifications on the test + a training sample. and, if you don't plan to use a training sample for everyone's PL-PD then why not simply take the average of their grades? how can swap provide a better solution over a simple average?

— Reply to this email directly or view it on GitHub https://github.com/drphilmarshall/SpaceWarps/issues/162#issuecomment-97277251 .

cpadavis commented 9 years ago

I have written a very basic 'expertdb' package that takes in user classifications from a csv file (reading columns 'SubjectID', 'AgentID', and 'Classification') and has methods find and digest that can be run with SWAP.py (ie all that needs to be added is some flag in SWAP.py to tell swap that it's looking at an ExpertDB instead of a MongoDB). Currently it just takes any classification > 0 to be a LENS.

As for translating expert grades, I imagine that it becomes one more thing you need to calibrate. You expand classification types from "LENS" and "NOT" to "0", "1", etc. Now you do spacewarps with an expanded asymmetric confusion matrix. I am working out the updates to the formalism and will probably push an updated extended latex document with it to the repo tomorrow. I think the updated formalism will be relatively straightforward. An upshot of this is that you should also be able to extend this formalism in the opposite direction: instead of P("1" | LENS) - like terms, we can look at P("LENS" | Lensed Quasar) - like terms.

drphilmarshall commented 9 years ago

I thought this would be fun to code! :-) My plan for the grades was to try and interpret them as fixed fractions eg grade 2 might map to 0.67, grade 3 to 0.95 etc. But extending the way you suggest is perhaps more interesting - especially if there are volunteers out there who find quasars easy but arcs hard, or vice versa...

cpadavis commented 9 years ago

okay a basic multinomial model for generic numbers of classifications and label types is written down in sw-extended.tex and pushed. See section 3.4. I didn't mention how to do this online, but I think the steps are straightforward and I could be explicit about that if we wanted.

I realized while writing this down that you can also account for multiple classifications (either of the same type or otherwise) -- in other words, a way to account for multiple markers from users. I don't think that aspect is fully fleshed, and I'm not even sure it's really worth pursuing, but I thought it was neat to mention.

drphilmarshall commented 9 years ago

Nice! I have a couple of questions/comments:

cpadavis commented 9 years ago

on the multinomial mixture model -- sorry for the jargon! It's very similar to a gaussian mixture model. (more jargon!) Multinomial distributions are a generalization of binomial distributions, where instead of drawing from (0,1) N times, you draw from (0, 1, 2, ... M) N times. You might notice that membership in a gaussian mixture model or a multinomial mixture model is itself drawn from a multinomial distribution (e.g. draw from (LENS, NOT) 1 time for each point). They are particularly common in document classification algorithms, like email classification (spam, work, purchases, etc), where the idea is that each class of documents has a different multinomial distribution for describing the probability of a given word appearing.

I think the easiest way to describe the multinomial mixture model is to say how it generates classifications

For every point, you first generate a group membership. So you say that point 1 is a part of model 2. You can do this by saying that each model i has some probability p_i of being drawn. So in the binary model, you drew membership to group LENS with probability p^0 and to group NOT with probability (1 - p^0). You can imagine that your classifications are now NOT, LENSED QUASAR, and LENSED GALAXY (for simplicity), each with p_N, p_Q, and p_G of being drawn. Incidentally, your membership draws and probabilities can be described by a multinomial distribution.

OK so now you have drawn a model for your point. Now, having drawn that, you draw what kind of response you would receive. That is, you draw a classification from the distribution of P("classification" | NOT) or P("classification" | LENSED QUASAR) or whatever your membership was. So each type of classification has some associated probability of being drawn, e.g. p_0N, p_1N for P("0"|NOT) and P("1"|NOT) respectively. This is also a multinomial distribution. If this were a guassian mixture model, you would instead draw a point from the gaussian distribution of Norm(mu_N, Sigma_N) if your point's membership were NOT. You can also draw more than one point -- so a user could be required to place N markers, so you draw N classifications from that same P("classification"|NOT) distribution.

One final note: apparently (according to wikipedia) "multinomial distributions" and "categorical distributions" are often conflated in machine learning type things. It looks to me that the real difference between the two is the multinomial coefficient (the n!/(k1!k2!...) thing) appears in one and not the other. For our purposes it doesn't really matter which we are talking about.

drphilmarshall commented 9 years ago

OK, good. Thanks!

Does this mean that we are talking about going from a 2x2 agent confusion matrix with 2 independent elements for the LENS/NOT simple case, to an NxM agent confusion matrix with Nx(M-1) independent elements? One practical issue is that it will take longer to train such an agent (because they will need to see M times as many sims to become skillful in each category). It's going to be interesting to see whether this is outweighed by th emodel allowing more information to be captured in the long run! I guess it will increase the dominance of the contribution by high effort/experience volunteers...

@aprajita and I were talking about this extended model yesterday, and wondered how one could implement a different dimension: labels that contain information about the object in question that has been extracted from the image data. A test subject in a targeted search could carry with an estimate of lens or arc brightness, and/or arc radius, etc. Can you see how the agents could take this information into account, to allow higher probability of being right about "easy" systems than "hard" ones? This could be important if we make the sims more difficult in the next project...

On Thu, May 7, 2015 at 3:35 AM, Chris notifications@github.com wrote:

on the multinomial mixture model -- sorry for the jargon! It's very similar to a gaussian mixture model. (more jargon!) Multinomial distributions are a generalization of binomial distributions, where instead of drawing from (0,1) N times, you draw from (0, 1, 2, ... M) N times. You might notice that membership in a gaussian mixture model or a multinomial mixture model is itself drawn from a multinomial distribution (e.g. draw from (LENS, NOT) 1 time for each point). They are particularly common in document classification algorithms, like email classification (spam, work, purchases, etc), where the idea is that each class of documents has a different multinomial distribution for describing the probability of a given word appearing.

I think the easiest way to describe the multinomial mixture model is to say how it generates classifications

For every point, you first generate a group membership. So you say that point 1 is a part of model 2. You can do this by saying that each model i has some probability p_i of being drawn. So in the binary model, you drew membership to group LENS with probability p^0 and to group NOT with probability (1 - p^0). You can imagine that your classifications are now NOT, LENSED QUASAR, and LENSED GALAXY (for simplicity), each with p_N, p_Q, and p_G of being drawn. Incidentally, your membership draws and probabilities can be described by a multinomial distribution.

OK so now you have drawn a model for your point. Now, having drawn that, you draw what kind of response you would receive. That is, you draw a classification from the distribution of P("classification" | NOT) or P("classification" | LENSED QUASAR) or whatever your membership was. So each type of classification has some associated probability of being drawn, e.g. p_0N, p_1N for P("0"|NOT) and P("1"|NOT) respectively. This is also a multinomial distribution. If this were a guassian mixture model, you would instead draw a point from the gaussian distribution of Norm(mu_N, Sigma_N) if your point's membership were NOT.

One final note: apparently (according to wikipedia) "multinomial distributions" and "categorical distributions" are often conflated in machine learning type things. It looks to me that the real difference between the two is the multinomial coefficient (the n!/(k1!k2!...) thing) appears in one and not the other. For our purposes it doesn't really matter which we are talking about.

— Reply to this email directly or view it on GitHub https://github.com/drphilmarshall/SpaceWarps/issues/162#issuecomment-99671627 .

cpadavis commented 9 years ago

online equations added to sw-extended. I also found a typo in my summary of the online system, which is better than finding a typo in the online system's code!

As for the idea of differentiating 'harder' and 'easier' systems, the most naieve thing is to assume they are separate groups, e.g. in your confusion matrix LENS and DUD become EASY_LENS, HARD_LENS, EASY_DUD, HARD_DUD, but you still want to find LENS and DUD. You could then say your confusion matrix (derived from training) is P("LENS" | EASY_LENS), for which you then need some additional matrix P(EASY_LENS | LENS) etc -> then P("LENS" | LENS) = P("LENS" | EASY_LENS) P(EASY_LENS | LENS) + P("LENS" | HARD_LENS) P(HARD_LENS | LENS) (P(EASY_DUD | LENS) = 0, and so on) to translate to the label of interest. The problem then becomes additionally estimating how many lenses in the wild are 'hard' vs 'easy' (or you can give a reasonable estimate and keep them fixed). ('hard' and 'easy' are placeholder names -- it could be things like 'lens with arc radius < 10' or 'arc within two fwhm of galaxy' whatever, too). Is that what you were thinking?

cpadavis commented 9 years ago

by the way this should be testable since we have lensed quasar etc flavors

drphilmarshall commented 9 years ago

OK: @cpadavis has included a section in the eSWAP draft about how to do all this, but we're postponing any tests for now while we finish the earlier parts of the paper (including focusing on a best-effort re-analysis of Stage 1, as mentioned in #155). The most likely part of the above discussion to make it in is the flavor-aware agents -hence the renaming of this issue!