CogComp / perspectrum

Perspectrum: a dataset of claims, perspectives and evidence documents
https://cogcomp.seas.upenn.edu/perspectrum/
31 stars 6 forks source link

Inconsistencies in the dataset #154

Closed kashpop closed 5 years ago

kashpop commented 5 years ago

I observe some inconsistencies in the dataset, especially the way evidence id is assigned in perspectrum_with_answers_v0.2.json. For example, take the first json object from perspectrum_with_answers_v0.2.json:

{"cId": 499, "text": "Vaccination must be made compulsory", "source": "idebate", "perspectives": [{"pids": [3695, 24076, 24077], "stance_label_3": "SUPPORT", "stance_label_5": "SUPPORT", "voter_counts": [3, 0, 0, 0, 0], "evidence": [2611, 7481, 2934]}, ....

Extracting the perspective ids: 3695, 24076 and 24077 (from perspective_pool_v0.2.json), here is what I get:

{"pId": 3695, "text": "It is the state\u2019s duty to protect its community ", "source": "idebate"} {"pId": 24076, "text": "The state must keep it's community safe.", "source": "paraphrase"} {"pId": 24077, "text": "The safety of the community is the state's priority.", "source": "paraphrase"}

However, when I try to get the evidence text using evidence ids: [2611, 7481, 2934] (from evidence_pool_v0.2.json), here is what I get:

{"eId": 2611, "text": "A custodial sentence has strong symbolic value, for offenders, for victims and for society as a whole. Exclusion from society and confiscation of freedoms that the state would normally protect at any cost is a powerful message, one that can be understood easily by both white collar fraudsters and semi-literate muggers. There are few more effective ways of communicating society\u2019s disapproval and indicating the boundaries of its tolerance. For all side proposition\u2019s talk of long term consequences and proportionality, there remain a significant number of offenders and potential offenders who would perceive the resolution as a weakness to be exploited. We give up the symbol of incarceration at the cost of emboldening criminals. Confidence in the state is founded on the state\u2019s ability to protect its citizens and their property from physical harm. This is something on which all but the most extreme ends of the political spectrum would agree. Even if the state is no longer willing to wield violence against a criminal minority in protection of a law abiding majority, it should still be prepared to project the power that contemporary constitutional settlements have allowed it to retain. If it does not, the state risks being accused of forgetting its core duties in favour of more abstract notions of \u201charm\u201d.", "source": "idebate"} {"eId": 7481, "text": "In Allison\u2019s model of bureaucratic politics foreign policy decisions result from negotiation between various governmental bureaucracies, to identify the cause of foreign policy the players, coalitions, bargains and compromises must be identified. Hey and Patrick J. Haney eds., Foreign Policy Analysis, Continuity and Change in Its Second Generation, Englewood Cliffs, 1995, pp.85-91, p88]] On the contrary some argue the policies of a country cannot be explained without knowing the personal goals and beliefs of the leaders of a country. These individuals also affect the reactions of other states. This is the Great Men theory of history with an emphasis on the individual level of analysis. Byman and Pollack have four basic hypotheses. It is individuals who set the intentions of the state; they can transform them or magnify already latent intentions. The competence of its individuals counts towards a states influence and military power; they build alliances, perceive threats, and create military strategy. Individuals decide how a state\u2019s resources will be used in pursuit of the goals they have created; it is individuals, on one side at least, who decide whether to go to war or negotiate. And finally individuals affect how an opposing state will react; they can either be charismatic and persuasive or bullying and aggressive, the opposing state\u2019s individuals will respond based upon the attitudes of the first state\u2019s individuals towards them.", "source": "debatewise"} {"eId": 2934, "text": "We can no longer argue that sovereignty must be considered absolute. Sovereignty was created as the means by which states justified the control of their territory to prevent foreign aggression. Since the creation of the United Nations, sovereignty is no longer as necessary to protect states, as most wars are not about territorial acquisition. Now it is primarily a barrier to the international community intervening when the state is abusing its own population. A better principle is if governments today are unable or unwilling to perform the duty to protect their people from harm (including state-imposed harm), then their claims to sovereignty lose their moral force and intervention becomes justified . For example, Qaddafi of Libya was likening his citizens to cockroaches and rats, threatening to kill them house-by-house whilst speaking of his intent to indiscriminately attack the population of Benghazi . As such, there was significant concern that violence would have devastating impacts on Libyan civilians. The United Nations, in response, authorized NATO action . Through unleashing state military assets to attack his own population, Qaddafi made it clear that he was not a fit leader. The United Nations, as the representative of the international community, has the responsibility to protect those whose leaders have let them down.", "source": "idebate"}

None of these evidence are relevant to the claim or perspectives.

How are the perspectives and evidence linked with each other? Am I missing something?

schen149 commented 5 years ago

(Sorry for the late response... I don't personally own this repo so I didn't get notification for the issue)

You are right. These evidence are irrelevant. I just checked the data and this seems like an annotation error. For some reason 3 out of 4 crowdworkers think this is a valid evidence.

I will remove this particular evidence in the next release of the dataset (which will be available in a few hours).