Updated database connection parameters to use environment variables
Added physionet.org data directory to .gitignore
PR Type
[Feature, Fix, Documentation]
Short Description
PR Summary:
A new Python script load_ndjson_to_postgres.py to load MIMIC-IV FHIR dataset files in NDJSON format directly into a PostgreSQL database.
Updated collect.py to use environment variables for the database connection to improve security.
An addition to .gitignore to exclude physionet.org directory data files to keep the repository clean.
Tests Added
No specific unit tests were added for this script. The script was tested manually with sample NDJSON files with a successful data import into PostgreSQL.
Issue Reference
Closes #84 – resolves the need for loading MIMIC-IV FHIR NDJSON files into the PostgreSQL database for use with collect.py.
Detailed Description
1. Added Script:
The new script load_ndjson_to_postgres.py reads each NDJSON file in a specified directory, flattens nested JSON data where necessary, and loads the data into a specified PostgreSQL database. This streamlines the process of loading large FHIR datasets for analysis.
2. Updates to collect.py:
The script collect.py now uses environment variables to fetch database credentials.
3. .gitignore Update:
Excluded the physionet.org in case users download the dataset inside the main repository.
Environment Variable Setup:
To use the new scripts, please set up environment variables:
PR Type
[Feature, Fix, Documentation]
Short Description
PR Summary:
load_ndjson_to_postgres.py
to load MIMIC-IV FHIR dataset files in NDJSON format directly into a PostgreSQL database.collect.py
to use environment variables for the database connection to improve security..gitignore
to excludephysionet.org
directory data files to keep the repository clean.Tests Added
No specific unit tests were added for this script. The script was tested manually with sample NDJSON files with a successful data import into PostgreSQL.
Issue Reference
Closes #84 – resolves the need for loading MIMIC-IV FHIR NDJSON files into the PostgreSQL database for use with
collect.py
.Detailed Description
1. Added Script:
The new script
load_ndjson_to_postgres.py
reads each NDJSON file in a specified directory, flattens nested JSON data where necessary, and loads the data into a specified PostgreSQL database. This streamlines the process of loading large FHIR datasets for analysis.2. Updates to
collect.py
:The script
collect.py
now uses environment variables to fetch database credentials.3.
.gitignore
Update:Excluded the
physionet.org
in case users download the dataset inside the main repository.Environment Variable Setup:
To use the new scripts, please set up environment variables: