e-plus-healthcare-alliance / popular-EHR-data-models-in-china

data model for HIS junweiyihao
3 stars 3 forks source link

3M™ Healthcare Data Dictionary #1

Open wanghaisheng opened 8 years ago

wanghaisheng commented 8 years ago


wanghaisheng commented 8 years ago

数据字典的建设   如何整理出概念列表

1. 收集集成平台、全院、blb、影像数据库表结构、数据 2. 收集输入法自带词库 重点是医学词库 3. 收集医学词典、书籍 4. 收集SNOMED 汉语版本、LOINC汉语版本、英文版本 MESH 5. 收集人名、地名、单位名、医疗机构名称 6. 卫计委的数据元 数据集标准 值域代码


  1. 参考hdd中已有的概念
  2. 参考ohdsi中词汇的管理
  3. 从数据元 数据集中切概念 从LOINC中文中切概念 从SNOMED中文版中直接导概念
  4.   HDD数据库表结构

The CONCEPT_HA table defines each concept in HDD. Each row in the concept table represents one unique concept, with eight columns describing the properties of the concept. Column name Type Constraints Description ncid Number(20) Not null, PK Unique integer that identifies the concept. The other columns in this table define the concept that this column represents. cid Varchar2(250) Not null, UK(1) A short text description of the concept. The CID is sometimes adequate to define the meaning of the concept. Every CID must be unique within a dictionary schema. (In this context, "schema" refers to a dictionary content set, not to the schemas that make up the database. See the description of the schema_ncid column below for more information.) Other data columns such as concept_definition and comments are used in conjunction with the CID column to define a concept. status_ncid Number(20) Not null Status of this concept: active, inactive, obsolete, etc. The status is indicated by an NCID (such as the NCID for active, for example). An NCID is deactivated if it is found to be a duplicate NCID for a previously identified concept, since every concept must be identified by a unique NCID. In the table row for an inactive NCID, the superceded_by_ncid column (see below) indicates the NCID that replaces the inactive NCID. superceded_by_ncid Number(20) If this concept was judged to be a duplicate or was replaced by a different concept for some other reason, this column holds the NCID for the concept that took the place of this (now inactive) concept. See statusncid above for more information. enterprise ncid Number(20) Not null If a concept is used exclusively by a specific organization, this column carries the NCID representing the organization. Otherwise, it contains the NCID 1, (3M Health Information Systems), indicating that it is applicable to everyone. concept_ definition Varchar2(725) Additional information clarifying the meaning of the concept as an informal definition. comments Varchar2(725) Additional comments, if needed to clarify the concept or provide other information. schema_ncid Number(20) Not null, UK(2) An NCID specifying whether the HDD contents are global or are for a specific purpose, such as testing. Currently only one schema is used in all 3M installations of the product, "3M Default Schema." This column would contain a different value only if the system were based on an entirely different data model maintained by another entity, such as an HL7 model. Currently, there are no separate HDD systems based on other models like these. (Notice that this use of the word schema has a different meaning from the schema divisions used to organize the tables in a relational database.)  

3M数据字典建设情况 The 3M Healthcare Data Dictionary is 3M's proprietary terminology server published since 1994. It forms the strong foundation from which HDD Access was derived. The 3M HDD contains five times the amount of content as HDD Access.

3M HDD content includes:

1.5 million concepts

20 million descriptions

preferred names
proprietary codes
standard terminologies
local terminologies

17 million relationships

knowledge base to link and organize concepts
mappings between standard terminologies
mappings between local and standard terminologies

3M HDD includes all the content in HDD Access, and includes additional standard terminologies that are not included in HDD Access, such as:

Current Procedural Terminology (CPT)
Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT), US Edition
First Databank (FDB)
and many others.

Additionally, the 3M HDD software has additional features for improved browsing and authoring of terminology content.

advanced search features (fuzzy search and faceted search)
graphical terminology viewer
automated terminology mapper
multiple versions of standard and local terminologies

It is also coupled with tools that support automated mapping of local content. The 3M HDD can connect to an enterprise service bus, interface engine or a data warehouse and can be accessed through standard (HL7 CTS) and proprietary (3M) web service APIs.

The 3M HDD has been continuously expanded and maintained for over 20 years. The 3M HDD enables you to manage medical terminology, integrate content and standardize healthcare data. The technology assists organizations in the bi-directional translation of data across systems and applications, regardless of where data originates.

Specific Use Cases include:

Real-time terminology translation in interfaces and clinical data repositories
Enabling research from a data warehouse
Supporting health information exchange
Standardization for Meaningful Use

  HDD Access Content version 37

HDD Access terminology content Version 37 contains 650,020 unique concepts. We reviewed and added to previously released domains, most notably the Clinical Care Classification system™. The list of domains is viewable in the left sidebar (Search Domains) of the HDD Access browser. Clicking the triangle to the left of each domain will open sub-domains.

Standard terminologies included are:

·ICD-9-CM Diagnoses

·ICD-9-CM Procedures

·ICD-10-CM (hierarchies)

·ICD-10-PCS (codes but not attributes/hierarchies)


·HCPCS Modifiers






HDD Access content is incremented monthly. The following table shows the total row counts of the four core tables in this month's and last month's releases: HDD Access Table Number of Rows January 2016 (Content v36) Number of Rows May 2016 (Content v37) Concept_ha 644,146 650,020 Concept_Relation_ha 3,574,377 3,590,692 Rsform_ha 3,853,880 3,895,029 Rsform_Context_ha 4,567,038 4,589,011


  1. NCID数值型的唯一标识 全系统唯一
  2. CID 概念标识 A CID is a short, language-based value and often is sufficient to tell the meaning of the concept.
  3. Concept definition 对概念的解释 用于描述其含义 NCID CID Concept Definition 1145 Arm Anatomy: Arm is the upper vertebrate limb extending from the shoulder to the wrist. 15150845 Sulfamethoxazol- ePlusTrimethoprimCommaTabletCommaOral Clinical drug: Sulfamethoxazole + Trimethoprim, tablet, oral. 16192677 CommonColdSct Disease: The common cold is a contagious viral infection of the upper respiratory tract. 1218 Dr-John-Doe For testing, domains: Clinician (1213) and Dictating Clinician (1214). (Note: The concept definition for a user in a real healthcare organization is more meaningful than this example.)

Representation 同一概念存在多种表达方式,不论是对于人还是机器。其形式可以是一个单词/字、短语、一些数字、或者数字字短语的混合体。也可以是图片、声音文件。对于计算机来讲也就是标识identifier,又叫做编码/代码/值,对于人来说,也就是一串人可读的文本字符串,称之为designation/名称/含义 例如,同一种药品,采用不同的编码体系/药品字典就会有不同的药品编码。

概念concept与表示representation之间的关联关系 一个NCID表示一个概念,把一个个representation称之为related surface form/相应的表达形式,也就是概念的皮,或者机器、人所看到的概念形式。每一个representation都有一个全局唯一的标识符 rsform_id。 为什么需要rsform_id: a concept can have multiple representations and a single representation can express multiple concepts. 一个唯一的概念可以对应多个representation


Context 在不同的场合/场景/产品中,同一个概念可以用不同的表达形式来表示,包括不同的编码,不同的名称/叫法。使用context来表示某个representation是什么或它的来源(RxNorm编码值)、使用方法(简写或默认显示值)。 Preferred representations 首选表示方式 一个概念对应多个representation,某些representation可以用在多个context中。一个concept在同一个context下可能会有多种representation (e.g., “Hypertension” and “High Blood Pressure” are two representations in the "3M Default Text Context" for NCID 82725).

一旦某个系统选择使用某个context,如果某个概念concept存在多个representation,如何进行选择,这里使用preferred score来区分,如果是0,表示首选项。

对于上图中的概念”高血压”来讲,第一种representation表示方式” Hypertension”的rsform_id是1057920,CID也是1057920,因此context就是CID Name context (NCID 367),同时也可以用在3M Default Text Context (NCID 2000) 中,且preferred score为0。在3M Default Text Context (NCID 2000)中,还有“High Blood Pressure” (rsform_id 64087595) and “HYPERTENSION” (rsform_id 1055132 – note that the HDD is case-sensitive with regards to representations) with preferred scores of 1. 尽管下面的几个representation都是首选项,但它们的存在能够简化在HDD中搜索某个概念。如果在某个context下想切换首选项表达方式,只需要调整preferred score即可。 How applications could use representations, contexts, and preferred scores Representation、context、preferred score在系统中如何使用 系统间数据转换、展示数据给用户中,Representation、context、preferred score是十分有用的。


当某个概念的值是采用各自系统自定的编码值时,Context能够帮助某个医疗机构理解从其他信息系统这获取的数据。比如某住院工作站使用HPT描述患者的疾病”高血压”,而门诊工作站使用123来表示。在HDD中,首先定义一个概念NCID82725,然后定义HPT、123是这个概念的2种不同表达方式,每一种表达方式都与一种context关联,hospital A context and clinic B context。这样,无论使用什么样的编码值来表示诊断,通过HDD都能找到患有高血压的患者。 如果要进行外部数据交换,比如HDD 中所有SNOMED CT编码都归在 SNOMED CT Concept ID context-NCID 76781.该context下,高血压(概念 NCID 82725)的representation(rsform_id 20873345)是38341003,由于对于高血压而言,只有一个SNOMED CT identifier,那么该rsform_id 对应的preferred score设为0。 不同字典之间也可以进行转换。比如SNOMED CT中用248152002表示女,而HL7 中使用F来表示女。在HDD这对于性别女这个概念来说,248152002 and F两种表示方式都与不同的context对应起来。

Data display


  1. 在系统的下拉列表或数据框中显示/展示概念
  2. 打印某种类型的报告
  3. 在医疗机构或下属某个科室显示/展示概念

对于每个概念来说,在"3M Default Text Context" (NCID 2000) 中至少会存在一种representation。然而对于其他如检验指标值这类数据,医疗机构可能希望是自己内部的一套 统一表示方式,比如报告中是一串简称,更详细的展示时使用一个较长的字符串。还可能存在取自其他标准术语中的名称如SNOMED CT • SNOMED CT Fully Specified Name (NCID 76783) • SNOMED CT Preferred Term (NCID 76782) When an application retrieves information from a patient database, if the data is encoded and stored as NCIDs, then the application would display the retrieved NCID using its representation with preferred score 0 in the pre-selected context. Alternatively, the application might allow the user to select from a list of contexts, containing choices such as SNOMED CT Fully Specified Name or SNOMED CT Preferred Name, etc. The representation with preferred score 0 in the user-selected context for the retrieved NCID would then be displayed to the user. If the patient data is encoded and stored with system-specific or standard codes, then the HDD is used to translate from the retrieved code to an NCID, then to the required representation for display.

  Knowledge base: Useful relationships 语言学的研究表明一个voacabulary不仅仅是数以千计的概念以及表示这些概念的词语的集合。其中还有另一类的知识,也就是某个概念如何与vocabulary中的其他概念联系或关联起来。




Relationships in the dictionary


HDD中已经定义了的每一种关联关系都有一个NCID,其自身也是一个概念,因此关联关系的类型的含义本身是无歧义的。比如has ingredient表示药品成分,它的NCID为1273.

可以利用关联关系,依据共有的特性,对概念进行分组,比如各种类型的名称可以通过has child relationship (NCID 364)来表示

Concept Relationship Concept Name Type Has child Legal name Patient's name Maiden name Alias name Preferred name Mother's maiden name Nickname Professional name

HDD中的关联关系也可以有多种用途: • 表示一些项目是其他项目的组件,has component relationship (NCID 1247) between a Complete Blood Count lab order and its various resulting tests. • 表示一些化学成分是某种药品的组成成分,has ingredient relationship (NCID 1273) between Septra and its ingredients, Trimethoprim and Sulfamethoxazole) • 表示因果关系 a particular antigen will cause the production of a particular antibody. • 表示其他关系, , that certain observations might be signs of a particular condition, or that specific conditions often occur together. • Grouping certain concepts together for programming purposes, for example, grouping certain concepts for drop-down lists or to support a particular use case (such as mandatory reporting of certain positive lab tests). • Representing real-world knowledge via groups or hierarchies of concepts, such as diseases, bacteria and viruses, medical procedures, lab tests, medications, etc. "Has child" and "has member" relationships The has child relationship (NCID 364) is used to group items together based on similar characteristics. The "grouping" concept is often referred to as a domain. Domains can group other domains (sometimes referred to as subdomains). Thus, domains have different levels of specificity. For example, the domain of drugs can relate Antibiotics, Analgesics, Vitamins, etc. Each of these can in turn relate more specific drug classes, such as Penicillins, Aminoglycosides and others for Antibiotics, which then group the drug ingredients, such as Amoxicillin and Ampicillin under Penicillins. Hierarchy example The has child relationship allows similar concepts to be arranged in a multi-level hierarchy. In a hierarchy, the has child relationship implies that the "child" concept is at the immediate next level to the grouping concept it has been linked to. Using a series of has child relationships allows the HDD to define a multiple-level hierarchy. For example, the figure below shows several levels in a multi-level hierarchy for allergens. (At each level, the grouping concept is related to multiple child concepts, but for simplicity, the illustration shows just two). As you can see, each level in the hierarchy becomes more specific. At the end of the levels, we reach a concept that is a specific type of allergen.

The has member relationship (NCID 363) links the domain concept with all the concepts beneath it, even if the structure beneath it includes a multiple-level hierarchy. Unlike the has child relationship (NCID 364), if concept A has a has member relationship to concept B, the has member relationship does not imply that concept B is at the immediate next level to concept A, it may or may not. Therefore, the has member relationship simply indicates membership in a domain. For instance, all of the following relationship statements are true: • Non-food animal allergen - has member cat • Environment related allergen - has member cat • Allergen - has member cat   Difference between "has component" and "has child" or "has member" The difference between has component and has child is that the items linked with has component are combined together to create the item that they link to. Has child and has member, on the other hand, merely groups items together based on similar characteristics (such as various name types being grouped together in a domain called Name Type). Difference between "has ingredient" and "has child" or "has member" The has ingredient relationship (NCID 1273) is used for pharmacy items. Many drugs have more than one ingredient, and the has ingredient relationship links these chemical ingredients to the drug that they combine to create. For example, Septra is a drug with the ingredients Sulfamethoxazole and Trimethoprim. Similar to has component, the items linked with has ingredients are combined together to create the item that they link to. While has child and has member group items together based on similar characteristics, the items linked by has ingredient to the medications may belong to different groups with different characteristics. Has search category and has search subcategory relationships The has search category relationship (NCID 16608026) and has search sub category relationship (NCID 16608027) are structural relationships that organize content within the HDD for browsing and searching HDD content. For example, if you want to find HDD concepts that relate to patient demographics, you would want to limit your search to the Patient Demographics Search Category (NCID 16611564). Multiple data elements such as Gender (NCID 1209), Marital Status (NCID 248) or Race (NCID 246) are grouped together within this search category. The has search category and has search subcategory relationships are intended to create the search categories that are graphically represented within the online browser and should not be confused with the has member, has child, or has component relationships.