IKANOW / Absolute-Pin

Absolute Pin Project
http://absolutepin.ikanow.com
0 stars 0 forks source link

[INF-1710] Translate documents prior to automated entity extraction #22

Open astrite opened 11 years ago

astrite commented 11 years ago

Many of the documents need to be translated prior to automated entity extraction. Current work arounds only work with text searches, not entity or association searches. Also limits effectiveness of some widgets.

astrite commented 11 years ago

Combination of switch of entity extractors and aliasing tool should alleviate the problem without incurring additional cost.

sschneiderman commented 11 years ago

Based on conversation with Alex P, IKANOW was to evaluate whether translating prior to extracting was effective in certain foreign languages. This was a topic for further discussion.

astrite commented 11 years ago

Testing is on my to-do list (assigning to me for tracking).

sschneiderman commented 11 years ago

andrew, also reading through my notes, Alex P indicated a potential solution involving a white list although I'm still fuzzy on this concept.

sschneiderman commented 11 years ago

Can we move the foreign language translation and processing issues to a separate milestone? I think its worthy of a separate discussion. In my mind, the source creation process is a separate milestone - we need a roadmap for a process for selecting source packages that is within Candor Finance (sans extensions if possible) and is easy to use for the analytical team.

astrite commented 11 years ago

Agreed - we think the case management tool may offer a reasonable way for an analyst to nominate a document for permanent translation / ingest via an automated extractor. Alternatives still include developing analytics based on a white-list of high-value terms (possibly derived from the case's people and companies).