RWD2E / cdc_als4m

This repository includes extraction and analysis codes for the ALS4M project funded by CDC (R01TS000336)
https://reporter.nih.gov/search/cMUGhJhRq0-5CwKGt8cNcQ/project-details/10610610
Apache License 2.0
0 stars 2 forks source link

Towards Better Understanding of ALS using a Multi-Marker Discovery Approach from a Multi-Modal Database (ALS4M)

Funding agency: CDC/ATSDR
Funding period: 10/2022 - 10/2025
PI: Xing Song (MU); Jeffery Statland (KUMC)
CDC Site: https://www.cdc.gov/als/ALSExternalResearchfundedbyRegistry.html
NIH rePORT: https://reporter.nih.gov/search/cMUGhJhRq0-5CwKGt8cNcQ/project-details/10610610
Project Number: R01TS000336
DROC request: #111

Study Overview

The overarching goal of this study is to use new large multi-modal data resources and machine-learning-based data mining algorithm to better understand risk factors and improve diagnosis for people with Amyotrophic lateral sclerosis (ALS). Amyotrophic lateral sclerosis (ALS) is a rare, fatal neurodegenerative disorder, with 90% sporadic cases do not have genetic causes and their contributing risk factors are largely unknown. Most of what is known about ALS risk factors comes from epidemiological studies using registry data, which historically forms the main standardized big data source to help describe the natural history, epidemiology, and burden of disease; however, the strength of evidence resulting from these studies varies greatly. One potential major limitation to registry data are the fields collected are based upon known potential risk factors, which have restricted its usability for exploring novel associations and causalities. Moreover, ALS is a rare disease with low prevalence, thus making it infeasible to study its etiology using traditional observational study design due to statistical power constraints. The digitization of healthcare records and the capacity to link to other relevant data sources now enables a more representative, enriched and statistically powerful study population; and ideal for leveraging machine-learning-driven, hypothesis-generating models to identify new risk factors and patterns identify new risk factors important for understanding, diagnosing, or treating people with ALS.

For the proposed study, we will interrogate this integrated multi-model database using a set of multi-marker selection and extraction algorithms based on our established work to achieve the following specific aims: