aphp / eds-scikit

eds-scikit is a Python library providing tools to process and analyse OMOP data
https://aphp.github.io/eds-scikit
BSD 3-Clause "New" or "Revised" License
35 stars 5 forks source link

Errors when running `introduction.ipynb` #56

Open paul-bssr opened 7 months ago

paul-bssr commented 7 months ago

When running codes from A gentle demo section in documentation, some commands return errors (probably originating from small syntax changes) using version 0.1.6.

Description

  1. In section section "Extracting diabetes status", the following command does not output the same result than in documentation
    diabetes.concept.value_counts()

Discrepancy solved in my case by replacing concept by value column

  1. In section "Extracting covid status", the code cell below returns a KeyError: 'code_list' arising from line 81 in event_from_code function
codes = dict(
    COVID=dict(
        code_list=r"U071[0145]", 
        code_type="regex",
    )
)

covid = conditions_from_icd10(
    condition_occurrence=data.condition_occurrence,
    visit_occurrence=data.visit_occurrence,
    codes=codes,
    date_min=DATE_MIN,
    date_max=DATE_MAX,
)

Changing the dictionary in the following way solved the issue in my case :

codes = dict(
    COVID=dict(
        regex=r"U071[0145]", 
    )
)
  1. In section "Adding patient age", the following error is raised when trying to compute patient age
    TypeError: One of the provided Serie isn't a datetime Serie

A solution in my case was to convert, birth_datetime to datetime format using the following command :

visit_detail_covid["birth_datetime"].apply(lambda x:pd.to_datetime(x))

I guess the issue might be coming from the i2b2 connector

How to reproduce the bug

Code to load an i2b2 database (common for the 3 bugs) :

import eds_scikit
import datetime
from eds_scikit.io import HiveData

database_name = "cse_**" 

data = HiveData(
    database_name=database_name,
    database_type="I2B2"
)

DATE_MIN = datetime.datetime(2018, 1, 1)
DATE_MAX = datetime.datetime(2019, 6, 1)

Minimal code for bug 1 :

from eds_scikit.event.diabetes import diabetes_from_icd10

diabetes = diabetes_from_icd10(
    condition_occurrence=data.condition_occurrence,
    visit_occurrence=data.visit_occurrence,
    date_min=DATE_MIN,
    date_max=DATE_MAX,
)

diabetes.concept.value_counts()

Minimal code for bug 2 :

from eds_scikit.event import conditions_from_icd10

codes = dict(
    COVID=dict(
        code_list=r"U071[0145]", 
        code_type="regex",
    )
)

covid = conditions_from_icd10(
    condition_occurrence=data.condition_occurrence,
    visit_occurrence=data.visit_occurrence,
    codes=codes,
    date_min=DATE_MIN,
    date_max=DATE_MAX,
)

Minimal code for bug 3 :

from eds_scikit.event import conditions_from_icd10
from eds_scikit.utils import datetime_helpers

codes = dict(
    COVID=dict(
        regex=r"U071[0145]", 
    )
)

covid = conditions_from_icd10(
    condition_occurrence=data.condition_occurrence,
    visit_occurrence=data.visit_occurrence,
    codes=codes,
    date_min=DATE_MIN,
    date_max=DATE_MAX,
)

visit_detail_covid = data.visit_detail.merge(
    covid[["visit_occurrence_id"]],
    on="visit_occurrence_id",
    how="inner",
)

visit_detail_covid = visit_detail_covid.merge(data.person[['person_id','birth_datetime']], 
                                              on='person_id', 
                                              how='inner')

visit_detail_covid["age"] = (
    datetime_helpers.substract_datetime(
        visit_detail_covid["visit_detail_start_datetime"],
        visit_detail_covid["birth_datetime"],
        out="hours",
    )
    / (24 * 365.25)
)