Sampling, Crowd-Sourcing & Reliability - Dawid and Skene 1979

jamesallenevans commented 4 years ago

Dawid, A. P., and Skene, A. M. 1979. “Maximum Likelihood Estimation of Observer Error-rates using the EM Algorithm.” Applied Statistics 28(1): 20-28.

lkcao commented 4 years ago

This method is interesting and remind me of Google PageRank where parameters are attained by convergence. However, I am still a little confusing about what is the main advantage of converging. Is it because that it helps various parameters to get to a balanced position where the likelihood becomes maximized? Or there are other reasons?

laurenjli commented 4 years ago

The authors state "Little is known at present about the accuracy of the estimates". Given this, how can this EM method be evaluated and/or compared to other methods developed in the future to decide which to implement in real-world applications?

di-Tong commented 4 years ago

It sounds very useful to measure individual error rates in addition to the degree of observer agreement to obtain reliable data. The video for this week's codes mention several different models of data annotation formulated in Rzhetsky A., Shatkay, H., and Wilbur, W.J. (2009). “How to get the most from your curation effort”, PLoS Computational Biology, 5(5). What are their key differences and how should we choose between these different models for applications?

sunying2018 commented 4 years ago

As stated in this article, the stop condition for EM algorithm is to find the maximum likelihood estimates and the missing estimates converge, this can be regarded as a kind of greedy algorithm to find the optima. But I have a question that if the amount data is super large, this algorithm need us to put all the data into this algorithm for every iterative, so how can we solve the limitation of computational resources. And because of the nature of this algorithm, it cannot be applied in parallel computing, can we use other alternative algorithm using such as stochastic gradient descent?

kdaej commented 4 years ago

Although this paper brings an interesting aspect of EM algorithm measuring the individual observers' errors, I am still not quite sure about the practicality of this method. In the example given in the paper, five observers record the data on each patient. However, in practice, patients usually see one clinician. Receiving opinions from multiple clinicians is time-consuming and expensive. Is the EM algorithm still useful when there are no multiple observations on each patient?

wunicoleshuhui commented 4 years ago

I'm also unsure about the practicality of applying the EM algorithm. Applying this algorithm appears to be dependent upon knowledge of observer k, but when data are gathered in empirical settings, it is easy to lose track of which observer is recording the data. What are any potential ways to overcome this limitation?

rachel-ker commented 4 years ago

It is interesting to me how the paper is able to in some way quantify individual error rates even when true responses are not available. The authors mentioned that two assumptions of independent responses and no patient-by-clinician interaction in pg 21 for their analysis - both of which I believe will be violated in real life scenarios. I was wondering if there is any way to conceptualise how these assumptions would affect the conclusions of this analysis and whether we could then learn the impacts in real life scenarios?

iamlaurenbeard commented 4 years ago

I am additionally commenting on the applicability of this algorithm. I can imagine a case where one could assess multiple observations of a single patient over time, as a result of seeing various clinicians throughout a patient's lifetime. Is there a way that one could extend this sort of analysis to make sense of changes in clinicians' (or other experts') training and methods over time?

bjcliang-uchi commented 4 years ago

I am interested in seeing a comparison of different algorithms similar to EM in practice. For example, when should we apply EM rather than K-means, or a general MM algorithm? Also, what are the assumptions that are vital to whether the algorithm works or not?

ccsuehara commented 4 years ago

In the context of content analysis, can we use this method to compare different sources about the same topic to seek a ground truth? For example, different newspaper websites.

cindychu commented 4 years ago

For the EM algorithm, convergence is significant in yielding the results for latent variables, especially in the case that consensus is not obvious in the data; however, with a computing expense trade-off. The author also put forward that one potential improvement for that is by ‘exploring ahead along the apparent direction of convergence’. I was wondering how this procedure might work and influence the EM algorithm results?

alakira commented 4 years ago

The paper succinctly summarized the mathematical background of EM algorithm. I have a question on the initial estimates. The author states that it is advisable to use several different starting points, but is there any possibility, or justification, for using a prior knowledge to determine the initial state so that it could somehow navigate the results to more preferable/reasonable local maximum?

heathercchen commented 4 years ago

The article presents a clear and concise model to capture response errors and provide estimates for missing data. However, I question whether this methodology can be applied to real-life circumstances, especially in the context of content analysis. For example, what if we cannot exhaust all types of errors and "true answers"? What if we cannot distinguish between erroneous responses and valid responses at the first stage of the experiment?

acmelamed commented 4 years ago

In the Discussion section of the article on page 25, Dawid and Skene remark that a hypothetical alternative method might be developed which could avoid the various problems and limitations of their EM algorithm as delineated in that section, such as the "large number of parameters" and the "large number of iterations [which are] usually required" for its functioning. Beyond what is hypothesized by the authors in this section, how can we imagine such an improved algorithm might operate?

tzkli commented 4 years ago

Given that the initialization of the parameters has consequences for the results we get, how should we go about initializing the parameters? The authors suggest that we "repeat the algorithm for several different sets of starting values" (p. 24), but given the parameters are continuous, the choice set can be infinite. Are there any rules of thumb or guiding principles regarding how to choose initial values?

deblnia commented 4 years ago

Given the medical observer context described, I think that this is a parsimonious and useful algorithm. Outside of relatively uniform spaces, however, (e.g. spaces with weird topologies or manifolds) is this model useful? That is to say, would a local maxima convergence, as promised on p.24 still be result in a useful estimate of MLE? This has been a problem in ERG networks I think, which suggests it might be more particular to the MLE than to content analysis.

snwang1225 commented 4 years ago

if a true response must be meaningful, how meaningfulness is measured? How to commensurate math formulas with a definition of meaning?

chun-hu commented 4 years ago

I'm unsure about the technical aspects of MLE and EM algorithm, but I'm wondering how we can apply these tools into content analysis? Any explanations and examples would be helpful!

skanthan95 commented 4 years ago

(1) Like @chun-hu , I am unfamiliar with the mathematical theory underlying this paper and would like to see examples of the model's applications to other subdisciplines in the social sciences. Most undergraduate psychology programs don't focus on the technical aspects of MLE or the EM algorithm to this degree; what are some ways in which we could scaffold learning these concepts for those encountering them in detail for the first time?

(2) Expanding on @ckoerner648's question: If we transfer the maximum likelihood estimation into a purely linguistic setting and measure the probability of words in a certain group of words to occur– if a person uses very esoteric language, would these very rarely used words be treated as error even if they are precisely what the author wanted to express?

sanittawan commented 4 years ago

My question pertains to the choice of distribution. As part of maximum likelihood estimation, the choice of distribution has to be specified in advance. On page 22, Dawid and Skene state that "...if q were the true response, the numbers of responses of each type actually obtained would be distributed according to a multinomial distribution..." There does not seem to be any explanation of why this is the case in the paper. Why multinomial distribution and not others?

Dominiquo commented 4 years ago

I have similar questions to @alakira about initial values and I was curious how this might vary given different type of question where the response isn't necessarily a range of numbers but a set of classes and if there's a way in the class example to actually vary language and watch the shifts in response error relative to tweaks and what type of insight that might give for inferring how someone interprets certain words/phrasing?

toecn commented 4 years ago

Like @skanthan95, I'm interested on discussing applications in social science. How could we apply these ideas?

Lizfeng commented 4 years ago

My question is there any situation where EM algorithm does not converge to a local maximum? Sometimes in MLE, we also need to check that it is local max after we get the result.

cytwill commented 4 years ago

I agree that the difficulty for this method is to capture the true response, especially in social science, I think we can hardly know the real responses in advance. Using EM seems to be a potential way to estimate true responses, but I guess using the EM method would also rely heavily on the data. If the data itself has a great bias, the final EM estimation might not help to get the error rate. And also, are there any good cases where this approach is applied to CSS?

ckoerner648 commented 4 years ago

Dawid and Skene 1979 present a statistical method to estimate the probability of observer error. They argue that their model could help doctors to get a better picture of the medical conditions of their patients. When patients are transferred from one clinic to another, different doctors may have noted different responses to the same question. Similarly, the patients may have found it difficult to answer “statistically satisfactorily”–that means that they might have used different words to describe their symptoms, or changed their replies from time to time. Maximum likelihood estimation provides an estimator for the error rates. But what if one patient has an extremely rare medical condition? Diseases often have many similar symptoms–and only a slight difference can lead a doctor to a fundamentally different diagnosis. I suppose, in most cases, maximum likelihood estimation will help doctors to get more certainty about the medical condition of their patients, but could it not also raise the very small but important possibility that extreme cases are overlooked?

jsmono commented 4 years ago

New to data analysis, I have a very basic question about how can we turn patients' responses to numbers that can be applied to the functions the author listed? The author mainly focused on the application in medicine, but if we are dealing with a large amount of data, such as newspaper, social media posts, how can we effectively define the variables?

luisesanmartin commented 4 years ago

Though this has already been asked by many classmates, I don't quite see how would a straightforward application (if possible) of this algorithm be implemented in Content Analysis. As I understood from the previous class reading (“Machine Translation: Mining Text for Social Theory”), the measurements in Computational Content Analysis are conducted by machines, so I don't see where a measurement error can be introduced.

ziwnchen commented 4 years ago

This paper presents a very useful method to estimate the true response/error rate when dealing with data like medical files which contains multiple facets and multiple observers for a single patient. My question is how to distinguish between the two potential approaches of measuring "observer agreement" and measuring "individual error rate". It seems that in the condition when the "true response" is not provided, the identification of T and the estimation of error rate still to some degree depends on the consensus of the observers.

lyl010 commented 4 years ago

My question is: Can we apply EM algorithm to word prediction? Is EM more about explanation or prediction? Thank you!

meowtiann commented 4 years ago

I don't get the math part at all, and I don't understand how the given example is related to assessing missing data. But at least I understand this method deals with interrater errors or even errors produced by the same rater throughout a time span.

yaoxishi commented 4 years ago

The paper presents a very useful algorithm to calculate the maximum likelihood estimates of the parameters, but I didn't find why it's selected as a fundamental reading for this week, how does it related to sampling and reliability and the content analysis?

adarshmathew commented 4 years ago

The Expectation Maximization algorithm is prone to finding local optima instead of a global one. The authors recommend randomizing the seed to overcome this issue, which is to be expected. But if we were applying EM to large corpora and looking to identify dependencies, each iteration would be computationally expensive. This would've been a serious issue in 1979, when the paper was written. How have researchers solved this problem in 2020?

rkcatipon commented 4 years ago

From my understanding of the reading, the Expectation Maximisation algorithm is a way to estimate latent variables that were not explicit in the dataset. I struggled to understand the math, but is the EM algorithm the basis for unsupervised clustering? Is it the basis for document clustering, for example?

xpw0222 commented 4 years ago

The paper presents a very useful algorithm to calculate the maximum likelihood estimates of the parameters, but I didn't find why it's selected as a fundamental reading for this week, how does it related to sampling and reliability and the content analysis?

Yeah I have the same confusion... I guess it might have something to do with the situation where we are trying to draw opinions from online posting? There could be some bias when we fit the posting contents into several categories.

arun-131293 commented 4 years ago

Since the paper explicitly talks about slow convergence, it would have been useful to use complexity theory to give an upper bound of the computational complexity for the proposed algorithm. The complexity could be expressed in terms of the relevant parameters (J,K).

luxin-tian commented 4 years ago

I did not quite understand the underlying logic of the MLE method for the case that true responses are not available. How are the initial estimates generated? Besides, the EM algorithm seems similar to the method used by pyanno, which is based on Dawid (1979) and implements either a Bayesian MAP or an MLE estimate of accuracy. How does the pyanno model generate the initial guess of the parameters? Is it a random draw from some assumed distribution or something else?

YanjieZhou commented 4 years ago

I notice that the EM algorithm adopts some strict assumptions including neglect of the patient-by-clinician effect, which, according to my own understanding, is to some extent impractical when applied to real research. So does this assumption prove to be practical by previous research when applied to big data which may result in the neutralization of all the possible interactions?

VivianQian19 commented 4 years ago

The article by Dawid and Skene presents the EM algorithm in which “each iteration consists of an Expectation (of missing data) step and a Maximization (mle) step” (23). I’m not sure I understand their point when they mention the conditions of the EM algorithm are satisfied when the indicator variables are treated as missing data?

Computational-Content-Analysis-2020 / Readings-Responses

Sampling, Crowd-Sourcing & Reliability - Dawid and Skene 1979 #6