eth-mds / ricu

🏥 ICU data with R 🏥
https://eth-mds.github.io/ricu/
GNU General Public License v3.0
37 stars 10 forks source link

Warning about missing rows upon importing MIMIC-IV dataset using ricu #70

Closed partizanos closed 4 months ago

partizanos commented 5 months ago

Hello I notice the following warnings after downloading and importing the datasets .

ALthough upon downloading the log mentions that checksums are checked and its fine

upon executing import_src("miiv") the following warning are displayed.

Successfully imported 31 tables

... 
1: expected 26850359 rows but got 13294903 rows for table `emar` 
2: Encountered parsing problems for file poe.csv.gz:
  • [33719551, NA]: got '1 columns' instead of 12 columns 
3: expected 39366291 rows but got 33719550 rows for table `poe` 
dplecko commented 5 months ago

Hi,

I think there may be an issue on your side here (not sure exactly what it is). For me, miiv$emar does have 26850359 rows, e.g.,

> miiv$emar
# <src_tbl>:  [26,850,359 ✖ 12]
# ID options: subject_id (patient) < hadm_id (hadm) < stay_id (icustay)
# Defaults:   `charttime` (index)
# Time vars:  `charttime`, `scheduletime`, `storetime`
           subject_id  hadm_id emar_id emar_seq poe_id pharmacy_id enter_provider_id charttime
                <int>    <int> <chr>      <int> <chr>        <int> <chr>             <dttm>
1            10000032 22595853 100000…       10 10000…    48770010 NA                2180-05-07 00:44:00
2            10000032 22595853 100000…       11 10000…    14779570 NA                2180-05-07 00:44:00
3            10000032 22595853 100000…       12 10000…    93463122 NA                2180-05-07 06:10:00
4            10000032 22595853 100000…       13 10000…    42497745 NA                2180-05-07 05:00:00
5            10000032 22595853 100000…       14 10000…    69131933 NA                2180-05-07 07:51:00
…
26,850,355   19999828       NA 199998…        4 19999…          NA NA                2147-07-17 18:39:00
26,850,356   19999828       NA 199998…        5 19999…          NA NA                2147-07-17 18:39:00
26,850,357   19999828       NA 199998…        6 19999…          NA NA                2147-07-17 18:39:00
26,850,358   19999828       NA 199998…        7 19999…          NA NA                2147-07-17 20:50:00
26,850,359   19999828       NA 199998…        8 19999…          NA NA                2147-07-17 21:36:00
# ℹ 26,850,354 more rows
# ℹ 4 more variables: medication <chr>, event_txt <chr>, scheduletime <dttm>, storetime <dttm>
# ℹ Use `print(n = ...)` to see more rows

Similar with the poe table. Things may have gone wrong during download or conversion for you?

partizanos commented 4 months ago

I managed to fix the issue thank you for verifying.