DevopediaOrg / webapp

A dummy repo as a placeholder for Devopedia webapp. It will be used mainly to track issues.
7 stars 0 forks source link

Acronym tooltip disambiguation #468

Open arvindpdmn opened 4 years ago

arvindpdmn commented 4 years ago

Suppose an article uses the acronym "CSS". On mouse hover, the expanded form is shown as a tooltip. If there are multiple expansions (such as Cascading Style Sheets and Chirp Spread Spectrum), all the expansions are shown. This is a limitation of Devopedia platform. Instead, only the most relevant expansion should be shown. In the article titled CSS Grid Layout, the only sensible expansion of CSS is Cascading Style Sheets.

The job of the NLP model is to look at the current article context or even paragraph context to know which is the correct expansion.

Where do the expansions come from? Devopedia maintains acronyms in two database tables:

In other words, an acronym and it's expansion doesn't enter the Devopedia system until it's defined in some article. All article content plus the two acronym tables will be shared as data for this task.

Optional: The scope of this work can be expanded to acronyms that are not in the system. In these case, the expansions would have to come from external sources (databases, Wikipedia, Google search, etc.) This means that if a new acronym is used by an author without defining, we'll still be able to show the correct expansion.

arvindpdmn commented 4 years ago

Examples Extracted by Devopedia

Examples Not Extracted by Devopedia

Other Issues

arvindpdmn commented 4 years ago

Adeft links:

Oracle links:

Other links:

Ideas:

arvindpdmn commented 4 years ago

Our code: https://github.com/teja0508/AcronymLookup A related issue: update article caches when an acronym changes. Eg. A new acronym is used in article A. Later article B defines it for the first time. Article cache of A must be updated.

arvindpdmn commented 1 month ago

Noting a couple of issues (which can be added to test data):

arvindpdmn commented 1 month ago

Disambiguation implemented as a web service with GraphQL. An incomplete design is below:

+ Add to cron (20hour UTC):
exec("cd ../tools/kaggle; $phpExec main.php");
+ Acronyms:
CREATE TABLE `xxxxx_acronyms_disambig` (
  `version_id` int(10) UNSIGNED NOT NULL,
  `position` tinyint UNSIGNED NOT NULL,
  `acronyms_id` int(10) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE `xxxxx_acronyms_disambig`
  ADD UNIQUE KEY `idx_version_id_position` (`version_id`, `position`);

OK
{"query":"{ acronym {name,description} }"}
{"query":"{ acronym(id:200) {name,description} }"}
{"query":"{ createAcronym(input: {name:\"ADC\", description:\"Ab Dase Cls\"}) }"}

{"query":"{ hello(id:200) {name,description} }"}
{"query":"{ createAcr(name:\"xxx\", description:\"yyy\") {id} }"}
{"query":"{ createAcr(name:\"ABC\", description:\"Ab Base Cls\") }"} # return plain Int
{"query":"{ createAcr(input: {name:\"ADC\", description:\"Ab Dase Cls\"}) }"}

KO
{"query":"{ createAcr(input:{id:440}) {id} }"}
{"mutation":"{ createAcr({id: 400, name: 'xxx', description: 'yyy'}) {Acronym {id, name}} }"}

type Query {
    hello(id: Int!): [Acronym]
    createAcr(name: String, description: String): Int!
}

type Acronym {
    id: Int!
    name: String!
    description: String!
}

schema {
  query: Query
  mutation: Mutation
}

type Query {
    hello: [Acronym]
}

type Acronym {
    id: Int!
    name: String!
    description: String!
}

type Mutation {
    createAcr(input: Acronym): Acronym
    updateAcr(id: Int!, input: Acronym): Acronym
}