DHLab-TSENG / dxpr

Other
21 stars 3 forks source link

error when call icdDxToxxx #36

Open ningxuca opened 2 years ago

ningxuca commented 2 years ago

Error in vecseq(f, len, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 5441218 rows; more than 5423694 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice. My code is css <- icdDxToCCS(dxDataFile = diag2,idColName = PTID,icdColName = DIAGNOSIS_CD, icdVerColName = type, dateColName = DIAG_DATE)

Dataset diag2 is PTID ENCID DIAG_DATE DIAGNOSIS_CD type 1 PT608273746 E0000025104028467 2018-12-29 Z0000 10 2 PT229616319 E0000005436124507 2011-01-10 7295 9 3 PT608345956 E0000025163571021 2018-02-28 Z0000 10 4 PT608361660 E0000025142225399 2018-10-22 Z0000 10 5 PT235121286 E0000005387281824 2011-11-28 7295 9 6 PT240024801 E0000005371647449 2011-12-05 7295 9 7 PT240663058 E0000026942225637 2011-06-22 7295 9 8 PT154968782 E0000002253904854 2011-08-04 42833 9 9 PT154968782 E0000002253904855 2011-08-06 42833 9 10 PT154968782 E0000002253904856 2011-08-07 42833 9

yijutseng commented 2 years ago

Hello,

I copy and paste the diag2 into an excel file and test the codes, but I cannot reproduce the error.

library(dxpr)
library(readxl)
diag2 <- read_excel("dxpr_example.xlsx",
                                     col_types = c("text", "text", "date", "text", "numeric"))
css <- icdDxToCCS(dxDataFile = diag2,
                                  idColName = PTID,
                                  icdColName = DIAGNOSIS_CD, 
                                  icdVerColName = type, 
                                  dateColName = DIAG_DATE)

dxpr_example.xlsx (I made some duplication to test the function)

May I know the sessionInfo() of your environment when you get the error message?

My test environment:

R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readxl_1.3.1 dxpr_0.9.0  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.2      rstudioapi_0.13   magrittr_2.0.3    tidyselect_1.1.2 
 [5] munsell_0.5.0     colorspace_2.0-3  R6_2.5.1          rlang_1.0.2      
 [9] fansi_1.0.3       dplyr_1.0.8       tools_4.1.3       grid_4.1.3       
[13] data.table_1.14.2 gtable_0.3.0      utf8_1.2.2        cli_3.3.0        
[17] DBI_1.1.2         ellipsis_0.3.2    assertthat_0.2.1  tibble_3.1.7     
[21] lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4       ggplot2_3.3.6    
[25] vctrs_0.4.1       glue_1.6.2        cellranger_1.1.0  compiler_4.1.3   
[29] pillar_1.7.0      generics_0.1.2    scales_1.2.0      pkgconfig_2.0.3  
ningxuca commented 2 years ago

Hi Yi-Ju, Here is from sessioninfo() R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C

[5] LC_TIME=English_United States.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] dxpr_0.9.0 icd.data_1.0 icd_4.0.9 stringi_1.7.6 huxtable_5.5.0 labelled_2.9.1 gtsummary_1.6.0

[8] gt_0.6.0 tinytex_0.39 knitr_1.39 openxlsx_4.2.5 scales_1.2.0 ggThemeAssist_0.1.5 sjlabelled_1.2.0 [15] devtools_2.4.3 usethis_2.1.6 sjmisc_2.8.9 cli_3.3.0 janitor_2.1.0 haven_2.5.0 readxl_1.4.0 [22] stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6

[29] tidyverse_1.3.1 lubridate_1.8.0 vtable_1.3.3 kableExtra_1.3.4 data.table_1.14.2 exact2x2_1.6.6 exactci_1.4-2

[36] testthat_3.1.4 ssanv_1.1 Rcpp_1.0.8.3 forcats_0.5.1

loaded via a namespace (and not attached): [1] colorspace_2.0-3 ellipsis_0.3.2 rprojroot_2.0.3 snakecase_0.11.0 fs_1.5.2 rstudioapi_0.13 remotes_2.4.2

[8] fansi_1.0.3 xml2_1.3.3 cachem_1.0.6 pkgload_1.2.4 jsonlite_1.8.0 broom_0.8.0 dbplyr_2.2.0

[15] shiny_1.7.1 compiler_4.2.0 httr_1.4.3 backports_1.4.1 assertthat_0.2.1 fastmap_1.1.0 later_1.3.0

[22] formatR_1.12 htmltools_0.5.2 prettyunits_1.1.1 tools_4.2.0 gtable_0.3.0 glue_1.6.2 cellranger_1.1.0 [29] vctrs_0.4.1 svglite_2.1.0 broom.helpers_1.7.0 insight_0.17.1 xfun_0.31 ps_1.7.0 brio_1.1.3

[36] rvest_1.0.2 mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.1 hms_1.1.1 promises_1.2.0.1 memoise_2.0.1

[43] highr_0.9 desc_1.4.1 pkgbuild_1.3.1 zip_2.2.0 rlang_1.0.2 pkgconfig_2.0.3 systemfonts_1.0.4 [50] evaluate_0.15 processx_3.6.0 tidyselect_1.1.2 magrittr_2.0.3 R6_2.5.1 generics_0.1.2 DBI_1.1.2

[57] pillar_1.7.0 withr_2.5.0 modelr_0.1.8 crayon_1.5.1 utf8_1.2.2 tzdb_0.3.0 rmarkdown_2.14

[64] grid_4.2.0 callr_3.7.0 reprex_2.0.1 digest_0.6.29 webshot_0.5.3 xtable_1.8-4 httpuv_1.6.5

[71] munsell_0.5.0 viridisLite_0.4.0 sessioninfo_1.2.2

My input file is very large as it's a real EMR data. I suspect it's the weird ICD code that caused the issue. For example, I don't see leading 0s in my input file for ICD 9 code. So I tried to merge my input file with standard ICD9 code excel file and get the codes with leading 0s. It worked for the ICD9 portion. But for ICD10 portion of my input file, I merged with the file I downloaded on-line Section111ValidICD10-Jan2022.xlsx. It's the most recent ICD10 code file. I got the following error message for this ICD10 portion. I suspect a different ICD10 version was used in your package. Which version did you use? And the error message caused no output produced.

Wrong ICD format: total 178 ICD codes (the number of occurrences is in brackets) c("E780 (5553)", "E784 (5476)", "I272 (5021)", "M791 (3356)", "M4806 (2907)", "R938 (2332)", "R972 (1235)", "R8299 (1169)", "A047 (987)", "H578 (960)")

Error in [.data.table(dxDataFile, Version == 9, ) : Column 6 ['Short'] is a data.frame or data.table; malformed data.table. In addition: Warning messages: 1: The ICD mentioned above matches to "NA" due to the format or other issues. 2: "Wrong ICD format" means the ICD has wrong format 3: "Wrong ICD version" means the ICD classify to wrong ICD version (cause the "icd10usingDate" or other issues)

Thank you for getting back to you,

Emily

On Fri, Jul 15, 2022 at 6:39 PM Yi-Ju Tseng @.***> wrote:

Hello,

I copy and paste the diag2 into an excel file and test the codes, but I cannot reproduce the error.

library(dxpr) library(readxl) diag2 <- read_excel("dxpr_example.xlsx", col_types = c("text", "text", "date", "text", "numeric")) css <- icdDxToCCS(dxDataFile = diag2, idColName = PTID, icdColName = DIAGNOSIS_CD, icdVerColName = type, dateColName = DIAG_DATE)

dxpr_example.xlsx https://github.com/DHLab-TSENG/dxpr/files/9124895/dxpr_example.xlsx (I made some duplication to test the function)

May I know the sessionInfo() of your environment when you get the error message?

My test environment:

R version 4.1.3 (2022-03-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.4

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] readxl_1.3.1 dxpr_0.9.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.8.2 rstudioapi_0.13 magrittr_2.0.3 tidyselect_1.1.2 [5] munsell_0.5.0 colorspace_2.0-3 R6_2.5.1 rlang_1.0.2 [9] fansi_1.0.3 dplyr_1.0.8 tools_4.1.3 grid_4.1.3 [13] data.table_1.14.2 gtable_0.3.0 utf8_1.2.2 cli_3.3.0 [17] DBI_1.1.2 ellipsis_0.3.2 assertthat_0.2.1 tibble_3.1.7 [21] lifecycle_1.0.1 crayon_1.5.1 purrr_0.3.4 ggplot2_3.3.6 [25] vctrs_0.4.1 glue_1.6.2 cellranger_1.1.0 compiler_4.1.3 [29] pillar_1.7.0 generics_0.1.2 scales_1.2.0 pkgconfig_2.0.3

— Reply to this email directly, view it on GitHub https://github.com/DHLab-TSENG/dxpr/issues/36#issuecomment-1186051839, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZJNASGIEO62WDUB3ALM3H3VUIHF7ANCNFSM53WVOIGQ . You are receiving this because you authored the thread.Message ID: @.***>

yijutseng commented 2 years ago

It seems that the versions of data.table and dxpr are the same in our environment. The only difference is the version of R. I test the code with Windows + R4.2.0 but still cannot reproduce your error.

The versions of the codes were listed in the document.

For the ICD-10, CCS can only be used on the version before 2019. The AHRQ updated the whole CCS coding system and develop a new system called CCSR. If you want to use the CCS because of the ICD-9, be sure that you check the newly added ICD-10 code, especially for COVID-19.

The main issue of the "E780" code is that this is not a billable code so you get the warning message. Please check your EHR data and see if replacing "E780" with "E7800" is reasonable. The other codes shown in the warning message can be treated in the same way.

However, based on your code, I think the "error" (not warning) might cause by other issues. I added the "E780" to my sample file and I can still get the output (with a warning message only). After googling the error message you have (Column 6 ['Short'] is a data.frame or data.table; malformed data.table.), I found it might cause by multiple columns with the same name or other reasons.

In your input data, does it have any other column with the name "Short"?

ningxuca commented 2 years ago

Hi Yi-Ju,

Thank you for getting back to me so quickly. And thanks for the tips. I am going to separate the ICD10 portion and use CCSR to map. The problem with my dataset is that it has lots of errors in the data, It's not possible for me to check the validity of each code. I have millions of millions of rows in the data. The strange thing is that I don't [Short] column in my dataset at all. I thought it was an intermediate table produced by your package. I am going to try the new mapping for ICD10 and let you know. ICD9 portion is good now.

Thanks again,

Emily

On Fri, Jul 15, 2022 at 8:15 PM Yi-Ju Tseng @.***> wrote:

It seems that the versions of data.table and dxpr are the same in our environment. The only difference is the version of R. I test the code with Windows + R4.2.0 but still cannot reproduce your error.

The versions of the codes were listed in the document https://dhlab-tseng.github.io/dxpr/articles/Eng_Diagnosis.html.

For the ICD-10, CCS can only be used on the version before 2019. The AHRQ updated the whole CCS coding system and develop a new system called CCSR. If you want to use the CCS because of the ICD-9, be sure that you check the newly added ICD-10 code, especially for COVID-19.

The main issue of the "E780" code is that this is not a billable code https://www.icd10data.com/ICD10CM/Codes/E00-E89/E70-E88/E78- so you get the warning message. Please check your EHR data and see if replacing "E780" with "E7800" is reasonable. The other codes shown in the warning message can be treated in the same way.

However, based on your code, I think the "error" (not warning) might cause by other issues. I added the "E780" to my sample file and I can still get the output (with a warning message only). After googling the error message you have (Column 6 ['Short'] is a data.frame or data.table; malformed data.table.), I found it might cause by multiple columns with the same name or other reasons.

In your input data, does it have any other column with the name "Short"?

— Reply to this email directly, view it on GitHub https://github.com/DHLab-TSENG/dxpr/issues/36#issuecomment-1186076622, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZJNASHBFGLJTIUT7HBZ5ZDVUISOHANCNFSM53WVOIGQ . You are receiving this because you authored the thread.Message ID: @.***>

yijutseng commented 2 years ago

Will you combine the CCS or CCSR grouping from ICD-9 and ICD-10 codes in the final analysis? If so, because the CCS and CCSR are not the same and cannot be analyzed together. If I need to pool data from both ICD-9 and 10 together, I usually use CCS directly and check how many "new ICD-10 codes" I have in the dataset then try to code it manually (the new codes are not commonly used).

For all the "non-billing" codes, you can use "icdDecimalToShort()" to check is there any suggestion for the edits. Most of the time we can just add one digit 0 or 9 after the original code. You will get suggestions in the output, then you can edit your "non-billing" code based on the suggestions.

decimal$Error
#>        ICD count IcdVersionInFile     WrongType Suggestion
#>  1:  A0.11    20           ICD 10  Wrong format           
#>  2:  V27.0    18           ICD 10 Wrong version           
#>  3:   E114     8           ICD 10  Wrong format           
#>  4: A01.05     8            ICD 9 Wrong version           
#>  5:  42761     7           ICD 10 Wrong version           
#>  6:  Z9.90     6           ICD 10  Wrong format           
#>  7:    F42     6           ICD 10  Wrong format           
#>  8:  V24.1     6           ICD 10 Wrong version           
#>  9:  A0105     5            ICD 9 Wrong version           
#> 10:    001     5            ICD 9  Wrong format       0019
#> 11:  75.52     4            ICD 9  Wrong format           
#> 12:  E03.0     4            ICD 9 Wrong version           
#> 13:    650     4           ICD 10 Wrong version           
#> 14: 123.45     3           ICD 10  Wrong format           
#> 15:  755.2     3            ICD 9  Wrong format     755.29
#> 16:   7552     2            ICD 9  Wrong format      75529

For the error message you have, that would be great if you can share a slice of data that can reproduce the error. We will investigate the cause on our side, too.

Thank you,

ningxuca commented 2 years ago

I see, the file I have doesn't have a dot in the ICD code. For example E030, instead of E03.0 Would the function icdDecimalToShort() work?

On Fri, Jul 15, 2022 at 8:49 PM Yi-Ju Tseng @.***> wrote:

Will you combine the CCS or CCSR grouping from ICD-9 and ICD-10 codes in the final analysis? If so, because the CCS and CCSR are not the same and cannot be analyzed together. If I need to pool data from both ICD-9 and 10 together, I usually use CCS directly and check how many "new ICD-10 codes" I have in the dataset then try to code it manually (the new codes are not commonly used).

For all the "non-billing" codes, you can use "icdDecimalToShort() https://dhlab-tseng.github.io/dxpr/articles/Eng_Diagnosis.html#a-2--uniform-short-format" to check is there any suggestion for the edits. Most of the time we can just add one digit 0 or 9 after the original code. You will get suggestions in the output, then you can edit your "non-billing" code based on the suggestions.

decimal$Error

> ICD count IcdVersionInFile WrongType Suggestion

> 1: A0.11 20 ICD 10 Wrong format

> 2: V27.0 18 ICD 10 Wrong version

> 3: E114 8 ICD 10 Wrong format

> 4: A01.05 8 ICD 9 Wrong version

> 5: 42761 7 ICD 10 Wrong version

> 6: Z9.90 6 ICD 10 Wrong format

> 7: F42 6 ICD 10 Wrong format

> 8: V24.1 6 ICD 10 Wrong version

> 9: A0105 5 ICD 9 Wrong version

> 10: 001 5 ICD 9 Wrong format 0019

> 11: 75.52 4 ICD 9 Wrong format

> 12: E03.0 4 ICD 9 Wrong version

> 13: 650 4 ICD 10 Wrong version

> 14: 123.45 3 ICD 10 Wrong format

> 15: 755.2 3 ICD 9 Wrong format 755.29

> 16: 7552 2 ICD 9 Wrong format 75529

For the error message you have, that would be great if you can share a slice of data that can reproduce the error. We will investigate the cause on our side, too.

Thank you,

— Reply to this email directly, view it on GitHub https://github.com/DHLab-TSENG/dxpr/issues/36#issuecomment-1186080835, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZJNASDOUX5HZ4MAPGDWNOTVUIWLHANCNFSM53WVOIGQ . You are receiving this because you authored the thread.Message ID: @.***>

ningxuca commented 2 years ago

Hi Yi-Ju,

I tried the icdDecimalToShort, it didn't work as no suggestion was given in my case. I also tried to CCSR mapping, same error message: Error in [.data.table(dxDataFile, Version == 10, ) : Column 6 ['Short'] is a data.frame or data.table; malformed data.table.

I got all the unique ICD10 codes in my project. Other columns in the file are dummy. Please see attached and I run it with function

css10 <- icdDxToCCSR(dxDataFile = u1,idColName = PTID,icdColName = code, icdVerColName = type, dateColName = DIAG_DATE)

The strange thing is that I tried to split my files into sections to pin-point where the issue is. when u1 = mycode[200:300,] gave an error message, but if I do u1 = mycode[200:250,], then again u1 =mycode[250:300,], run the function twice, there was no error message.

So It seems to me that maybe when there are too many records, the issue occurs. just my guess.

Thank you,

Emily

On Fri, Jul 15, 2022 at 9:40 PM Emily Xu @.***> wrote:

I see, the file I have doesn't have a dot in the ICD code. For example E030, instead of E03.0 Would the function icdDecimalToShort() work?

On Fri, Jul 15, 2022 at 8:49 PM Yi-Ju Tseng @.***> wrote:

Will you combine the CCS or CCSR grouping from ICD-9 and ICD-10 codes in the final analysis? If so, because the CCS and CCSR are not the same and cannot be analyzed together. If I need to pool data from both ICD-9 and 10 together, I usually use CCS directly and check how many "new ICD-10 codes" I have in the dataset then try to code it manually (the new codes are not commonly used).

For all the "non-billing" codes, you can use "icdDecimalToShort() https://dhlab-tseng.github.io/dxpr/articles/Eng_Diagnosis.html#a-2--uniform-short-format" to check is there any suggestion for the edits. Most of the time we can just add one digit 0 or 9 after the original code. You will get suggestions in the output, then you can edit your "non-billing" code based on the suggestions.

decimal$Error

> ICD count IcdVersionInFile WrongType Suggestion

> 1: A0.11 20 ICD 10 Wrong format

> 2: V27.0 18 ICD 10 Wrong version

> 3: E114 8 ICD 10 Wrong format

> 4: A01.05 8 ICD 9 Wrong version

> 5: 42761 7 ICD 10 Wrong version

> 6: Z9.90 6 ICD 10 Wrong format

> 7: F42 6 ICD 10 Wrong format

> 8: V24.1 6 ICD 10 Wrong version

> 9: A0105 5 ICD 9 Wrong version

> 10: 001 5 ICD 9 Wrong format 0019

> 11: 75.52 4 ICD 9 Wrong format

> 12: E03.0 4 ICD 9 Wrong version

> 13: 650 4 ICD 10 Wrong version

> 14: 123.45 3 ICD 10 Wrong format

> 15: 755.2 3 ICD 9 Wrong format 755.29

> 16: 7552 2 ICD 9 Wrong format 75529

For the error message you have, that would be great if you can share a slice of data that can reproduce the error. We will investigate the cause on our side, too.

Thank you,

— Reply to this email directly, view it on GitHub https://github.com/DHLab-TSENG/dxpr/issues/36#issuecomment-1186080835, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZJNASDOUX5HZ4MAPGDWNOTVUIWLHANCNFSM53WVOIGQ . You are receiving this because you authored the thread.Message ID: @.***>

yijutseng commented 2 years ago

Thanks for the test! I was wondering if it is possible to share the "mycode[200:300,]" in your code? You can replace all the patient IDs with integer sequences.

We have tested the dxpr package with 953,294 unique patients and 7,948,418 distinct diagnosis records (real-world data), so maybe the number of records is not the only factor to cause the error you have.

ningxuca commented 2 years ago

Here you go

Also in my last email, I have attached all the ICD10 codes in mycode.xlsx

On Sat, Jul 16, 2022 at 6:40 PM Yi-Ju Tseng @.***> wrote:

Thanks for the test! I was wondering if it is possible to share the "mycode[200:300,]" in your code? You can replace all the patient IDs with integer sequences.

We have tested the dxpr package with 953,294 unique patients and 7,948,418 distinct diagnosis records (real-world data), so maybe the number of records is not the only factor to cause the error you have.

— Reply to this email directly, view it on GitHub https://github.com/DHLab-TSENG/dxpr/issues/36#issuecomment-1186369253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZJNASE42BGYZOML3PS2XA3VUNQBRANCNFSM53WVOIGQ . You are receiving this because you authored the thread.Message ID: @.***>

yijutseng commented 2 years ago

I found that I cannot see the shared attachment.

截圖 2022-07-17 上午9 57 26

Maybe sharing a link with a google drive link would work?

ningxuca commented 2 years ago

mycode.xlsx mycode200_300.xlsx I have attached the two files here. One has all the codes, one has row 200 to 300.

yijutseng commented 2 years ago

Thank you for sharing the data. We've committed a new version https://github.com/DHLab-TSENG/dxpr/commit/8fcf78e0069dab14c74d09e713f22b73e382e09e of dxpr package

I test this version of package on your data and it works fine. The only issue is that your data have ~200 non-billable codes, such as A047. Our suggestion function is only for ICD-9 because the ICD-10 coding system doesn't have "0 or 9" logic. For example, the A047, should be modified to the following codes:

If you are grouping them into CCS or CCSR, basically A04.7, A04.71, or A04.72 are all defined as the same CCS or CCSR groups. Maybe you can try to impute or append 1 after the codes that are reported as in the wrong format.

Please let me know if you have any questions.

YiJu

ningxuca commented 2 years ago

Thank you, YiJu! When do you plan to release the new version? For our project, only about 20 diseases are of interest. We complied a list of ICD9/ICD10 codes for each individual disease, but it's time consuming and not necessary better than the CCS categorization. So I started searching for a R package converting ICD code to CCS. Thanks for sharing the application and debugging with me. Emily

yijutseng commented 2 years ago

It was just released as version 0.9.1. Feel free to reinstall it from GitHub.

# install.packages("remotes")
remotes::install_github("DHLab-TSENG/dxpr")

Here is the sessionInfo() after I reinstall the package from GitHub.

R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] dxpr_0.9.1

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13   magrittr_2.0.3    tidyselect_1.1.2 
 [4] munsell_0.5.0     colorspace_2.0-3  R6_2.5.1         
 [7] rlang_1.0.4       fansi_1.0.3       dplyr_1.0.8      
[10] tools_4.1.3       grid_4.1.3        data.table_1.14.2
[13] gtable_0.3.0      utf8_1.2.2        cli_3.3.0        
[16] DBI_1.1.2         ellipsis_0.3.2    assertthat_0.2.1 
[19] tibble_3.1.7      lifecycle_1.0.1   crayon_1.5.1     
[22] purrr_0.3.4       ggplot2_3.3.6     vctrs_0.4.1      
[25] glue_1.6.2        compiler_4.1.3    pillar_1.7.0     
[28] generics_0.1.2    scales_1.2.0      pkgconfig_2.0.3  

Thank you for reporting the issues.

YiJu

ningxuca commented 2 years ago

Hi YiJu,

I have the following codes showing as 'wrong code' . The reason is they are not billable? Screenshot 2022-07-17 150726

ningxuca commented 2 years ago

Hi YiJu, It seems that CCSR function didn't work, CCS function could run without error with the same input file I just want to let you know, but no rush. Screenshot 2022-07-17 164132

yijutseng commented 2 years ago

Hello,

Thank you for reporting the issue. I've tested the code with the file "mycode.xls" you shared in the previous reply.

library(dxpr)
library(readxl)
mycode <- read_excel("mycode.xlsx", 
                     col_types = c("text", "numeric", "date", "text"))
icdDxToCCSR(dxDataFile = mycode,idColName = PTID,
            icdColName = code,dateColName = DIAG_DATE,
            icdVerColName = type)

I get outputs without error

Wrong ICD format: total 178 ICD codes (the number of occurrences is in brackets)
c("A047 (1)", "C4312 (1)", "C44102 (1)", "C44112 (1)", "C44119 (1)", "C44122 (1)", "C44129 (1)", "C962 (1)", "D0312 (1)", "D0411 (1)")

$groupedDT
         Short     ID     ICD       Date Version                                                CCSR_CATEGORY_DESCRIPTION
    1:    D899 R00001    D899 2016-01-01      10                                                       Immunity disorders
    2:    R413 R00001    R413 2016-01-01      10                                        Nervous system signs and symptoms
    3:    I639 R00001    I639 2016-01-01      10                                                      Cerebral infarction
    4:   R9431 R00001   R9431 2016-01-01      10                                      Abnormal findings without diagnosis
    5:   R0602 R00001   R0602 2016-01-01      10                                           Respiratory signs and symptoms
   ---                                                                                                                   
18838: W171XXA R00001 W171XXA 2016-01-01      10         External cause codes: intent of injury, accidental/unintentional
18839: T84611A R00001 T84611A 2016-01-01      10 Complication of internal orthopedic device or implant, initial encounter
18840: T84629A R00001 T84629A 2016-01-01      10 Complication of internal orthopedic device or implant, initial encounter
18841:   K5229 R00001   K5229 2016-01-01      10                                            Noninfectious gastroenteritis
18842:   K5229 R00001   K5229 2016-01-01      10                                                       Allergic reactions

$summarised_groupedDT
         ID                CCSR_CATEGORY_DESCRIPTION firstCaseDate endCaseDate count period
  1: R00001                       Immunity disorders    2016-01-01  2016-01-01    45 0 days
  2: R00001        Nervous system signs and symptoms    2016-01-01  2016-01-01   122 0 days
  3: R00001                      Cerebral infarction    2016-01-01  2016-01-01   114 0 days
  4: R00001      Abnormal findings without diagnosis    2016-01-01  2016-01-01   143 0 days
  5: R00001           Respiratory signs and symptoms    2016-01-01  2016-01-01    32 0 days
 ---                                                                                       
493: R00001              Neonatal cerebral disorders    2016-01-01  2016-01-01     2 0 days
494: R00001 Neonatal digestive and feeding disorders    2016-01-01  2016-01-01     3 0 days
495: R00001            Neonatal acidemia and hypoxia    2016-01-01  2016-01-01     1 0 days
496: R00001          Maternal intrauterine infection    2016-01-01  2016-01-01     2 0 days
497: R00001               Autoinflammatory syndromes    2016-01-01  2016-01-01     1 0 days

$Error
         ICD count IcdVersionInFile    WrongType Suggestion
  1:    A047     1           ICD 10 Wrong format           
  2:   C4312     1           ICD 10 Wrong format           
  3:  C44102     1           ICD 10 Wrong format           
  4:  C44112     1           ICD 10 Wrong format           
  5:  C44119     1           ICD 10 Wrong format           
 ---                                                       
174: T8585XS     1           ICD 10 Wrong format           
175: T8586XA     1           ICD 10 Wrong format           
176: T8589XA     1           ICD 10 Wrong format           
177: V4752XA     1           ICD 10 Wrong format           
178: W452XXA     1           ICD 10 Wrong format           

警告訊息:
1: The ICD mentioned above matches to "NA" due to the format or other issues. 
2: "Wrong ICD format" means the ICD has wrong format 
3: "Wrong ICD version" means the ICD classify to wrong ICD version (cause the "icd10usingDate" or other issues) 

Maybe you can try to update the dxpr package, reload it and try again?

# install.packages("remotes")
remotes::install_github("DHLab-TSENG/dxpr")

The reason why you get the warning message for "T8586XA" is that we use the ICD-10-CM codes released by CMS. If your codes are not on the list, we will provide a list with a warning message.

Based on the ICD-10 coding structure, I think the dxpr package can provide some rules to deal with the differences in digits 6 or 7 because they usually do not affect the result of grouping. We need some times to develop the rules and test through our sample file.

ningxuca commented 2 years ago

I see, you used ICD-10-CM. I used ICD-10-DX. The file mycode I sent is the ICD9 portion in my data. Please see attached for the ICD10 portion which raised the error in CCSR function call. ICD10.xlsx