MyDigiTwinNL / CDF2Medmij-Mapping-tool

Tool for transforming Cohort-study Data (CDF) into FHIR/MedMij compliant resource bundles
Apache License 2.0
1 stars 0 forks source link

LifelinesDataAccessDocumentation example code not handling missing values #11

Closed baukearends closed 1 month ago

baukearends commented 1 year ago

In the example code on https://github.com/MyDigiTwinNL/LifelinesDataAccessDocumentation, the python example code throws an error due to the occurrence of missing values. I slightly edited the code to account for this:

import sqlite3
import numpy as np
from datetime import datetime

# Connect to the SQLite database
conn = sqlite3.connect('/groups/umcg-lifelines/tmp01/projects/ov22_0581/pheno_lifelines_sqlite/db-lifelines.db')  

# Define the SQL query to retrieve birth years of patients with T2D
sql_query = "SELECT BIRTHDATE FROM PATIENTS WHERE T2D_STATUS = 'Active';"

# Execute the query and fetch the birth years into a list
cursor = conn.execute(sql_query)
birth_years = [row[0] for row in cursor.fetchall()]

# Close the database connection
conn.close()

# Calculate ages from birth years and current year
current_year = datetime.now().year
ages = [current_year - int(birth_year) if birth_year is not None else np.nan for birth_year in birth_years]

# Calculate median and standard deviation using NumPy
median_age = np.nanmedian(ages)
std_dev_age = np.nanstd(ages)

# Display the results
print(f"Median Age of Patients with T2 Diabetes: {median_age} years")
print(f"Standard Deviation of Age: {std_dev_age:.2f} years")
hcadavid commented 7 months ago

Hi @baukearends ,

I just recently went back to work on the tools and realized that you posted this issue months ago (somehow I missed the the notification). My apologies for that. I run your script and it actually shows the Median and Stdev results. Would you mind running your script again and tell me if also works for you?