flairNLP / zelda

A comprehensive benchmark for entity disambiguation
22 stars 1 forks source link

How to get candidate List for each mention #2

Open abcdefsdf opened 9 months ago

abcdefsdf commented 9 months ago

Thanks for this great work. I have a question about the candidate list used in the paper.

For example, for one example here. {"page_id": 25, "section_name": "History", "text": "A few examples of autistic symptoms and treatments were described long before autism was named. The Table Talk of Martin Luther, compiled by his notetaker, Mathesius, contains the story of a 12-year-old boy who may have been severely autistic. Luther reportedly thought the boy was a soulless mass of flesh possessed by the devil, and suggested that he be suffocated, although a later critic has cast doubt on the veracity of this report. The earliest well-documented case of autism is that of Hugh Blair of Borgue, as detailed in a 1747 court case in which his brother successfully petitioned to annul Blair's marriage to gain Blair's inheritance. The Wild Boy of Aveyron, a feral child caught in 1798, showed several signs of autism; the medical student Jean Itard treated him with a behavioral program designed to help him form social attachments and to induce speech via imitation. The New Latin word autismus (English translation autism) was coined by the Swiss psychiatrist Eugen Bleuler in 1910 as he was defining symptoms of schizophrenia. He derived it from the Greek word aut\u00f3s (\u03b1\u1f50\u03c4\u03cc\u03c2, meaning \"self\"), and used it to mean morbid self-admiration, referring to \"autistic withdrawal of the patient to his fantasies, against which any influence from outside becomes an intolerable disturbance\". A Soviet child psychiatrist, Grunya Sukhareva, described a similar syndrome that was published in Russian in 1925, and in German in 1926. The word autism first took its modern sense in 1938 when Hans Asperger of the Vienna University Hospital adopted Bleuler's terminology autistic psychopaths in a lecture in German about child psychology. Asperger was investigating an ASD now known as Asperger syndrome, though for various reasons it was not widely recognized as a separate diagnosis until 1981. Leo Kanner of the Johns Hopkins Hospital first used autism in its modern sense in English when he introduced the label early infantile autism in a 1943 report of 11 children with striking behavioral similarities. Almost all the characteristics described in Kanner's first paper on the subject, notably \"autistic aloneness\" and \"insistence on sameness\", are still regarded as typical of the autistic spectrum of disorders. It is not known whether Kanner derived the term independently of Asperger. Donald Triplett was the first person diagnosed with autism. He was diagnosed by Kanner after being first examined in 1938, and was labeled as \"case 1\". Triplett was noted for his savant abilities, particularly being able to name musical notes played on a piano and to mentally multiply numbers. His father, Oliver, described him as socially withdrawn but interested in number patterns, music notes, letters of the alphabet, and U.S. president pictures. By the age of 2, he had the ability to recite the 23rd Psalm and memorized 25 questions and answers from the Presbyterian catechism. He was also interested in creating musical chords. Kanner's reuse of autism led to decades of confused terminology like infantile schizophrenia, and child psychiatry's focus on maternal deprivation led to misconceptions of autism as an infant's response to \"refrigerator mothers\". Starting in the late 1960s autism was established as a separate syndrome. As late as the mid-1970s there was little evidence of a genetic role in autism; while in 2007 it was believed to be one of the most heritable psychiatric conditions. Although the rise of parent organizations and the destigmatization of childhood ASD have affected how ASD is viewed, parents continue to feel social stigma in situations where their child's autistic behavior is perceived negatively, and many primary care physicians and medical specialists express some beliefs consistent with outdated autism research. It took until 1980 for the DSM-III to differentiate autism from childhood schizophrenia. In 1987, the DSM-III-R provided a checklist for diagnosing autism. In May 2013, the DSM-5 was released, updating the classification for pervasive developmental disorders. The grouping of disorders, including PDD-NOS, autism, Asperger syndrome, Rett syndrome, and CDD, has been removed and replaced with the general term of Autism Spectrum Disorders. The two categories that exist are impaired social communication and/or interaction, and restricted and/or repetitive behaviors. The Internet has helped autistic individuals bypass nonverbal cues and emotional sharing that they find difficult to deal with, and has given them a way to form online communities and work remotely. Societal and cultural aspects of autism have developed: some in the community seek a cure, while others believe that autism is simply another way of being.", "index": [[100, 110], [114, 127], [653, 672], [676, 687], [756, 766], [890, 899], [961, 966], [980, 993], [1033, 1046], [1331, 1347], [1497, 1510], [1518, 1544], [1625, 1641], [1690, 1707], [1801, 1811], [1819, 1841], [2298, 2313], [3142, 3161], [3547, 3560], [3647, 3669], [3675, 3693], [3785, 3792], [3860, 3869], [3931, 3936], [4055, 4062], [4072, 4089], [4091, 4104], [4110, 4113], [4524, 4563], [4641, 4678]], "wikipedia_ids": [30465262, 7567080, 32777, 19005888, 5753955, 21983, 7351032, 276370, 27790, 51251268, 19389318, 10342296, 9014, 37556, 637087, 218873, 53536238, 1817319, 2649767, 2356893, 3172179, 8498, 8498, 11973479, 694777, 37556, 56476, 898648, 11996900, 1073739], "wikipedia_titles": ["Table Talk (Luther)", "Martin Luther", "Victor of Aveyron", "Feral child", "Jean Marc Gaspard Itard", "New Latin", "Swiss people", "Eugen Bleuler", "Schizophrenia", "Grunya Sukhareva", "Hans Asperger", "Vienna General Hospital", "Developmental psychology", "Asperger syndrome", "Leo Kanner", "Johns Hopkins Hospital", "Donald Triplett", "Refrigerator mother theory", "Social stigma", "Primary care physician", "Medical specialty", "Diagnostic and Statistical Manual of Mental Disorders", "Diagnostic and Statistical Manual of Mental Disorders", "DSM-5", "Pervasive developmental disorder not otherwise specified", "Asperger syndrome", "Rett syndrome", "Childhood disintegrative disorder", "Societal and cultural aspects of autism", "Neurodiversity"]}

How do i get the entity candidate list for each mention here, for example, mention in [100, 110].

Thanks a lot.

abcdefsdf commented 9 months ago

For test aida-b, the recall is very low.

=================test_aida-b.jsonl================= Number mentions: 4485, Of which are contained in the candidate lists 4411 (74 missing). Total accuracy of mfs: 0.261 (0.265 on the mentions contained in the lists.) Total recall of candidates: 0.319 (0.324 on the mentions contained in the lists.)

So the most ground truth entity is not in candidate list. Is this correct? In most case, Should the ground truth entity in the candidate list?

abcdefsdf commented 9 months ago

I also take a look at test_aida-b.jsonl.

Most ground truth entity is even not in the KB (entity_descriptions.jsonl). There are only 1457 mentions have ground truth entity in the KB that you provide. I may wonder whether there is a problem of this file. Thanks a lot.