Description:Research article document classification dataset based on aspects of disease research. Currently, the dataset consists of three subsets: (A) classifies title/abstracts of papers into most popular subtypes of clinical, basic, and translational papers (~20k papers); (B) identifies whether a title/abstract of a paper describes substantive research into Quality of Life (~10k papers); (C) identifies if a paper is a natural history study (~10k papers). These classifications are particularly relevant in rare disease research, a field that is generally understudied.
Task:Document Classification for types of research experiments
Motivation:(1) These are medium/large sized human-curated corpora (>10K); (2) They address an understudied, high-value subfield (rare disease); (3) This forms the basis of a new collaboration between NCATs and CZI is likely to be an expanding set as more work is done.
Adding a Dataset