Regression modeling of sub-distribution functions in competing risks.
A python wrapper around the cmprsk R package.
Description: Estimation, testing and regression modeling of subdistribution functions in competing risks, as described in Gray (1988), A class of K-sample tests for comparing the cumulative incidence of a competing risk, Ann. Stat. 16:1141-1154, and Fine JP and Gray RJ (1999), A proportional hazards model for the subdistribution of a competing risk, JASA, 94:496-509.
Original Package documentation
rpy2
version 2.9.4. Since then, rpy2
had many breaking changes.
Therefore cmprsk
version 0.X.Y only works with rpy2
version 2.9.X.cmprsk
package v 1.X.Y is now up-to-date and is using rpy2
3.4.5. R
cmprsk
R package: open R terminal and run install.packages("cmprsk")
rpy2
- if using conda
for creating the virtual environment on MacOS M1 (apple silicon) install rpy2
using pip (tested on version 3.5.9) pandas
(tested on version 1.5.3)scipy
(tested on version 1.10.1)pytest
and pytest-cov
for running unit tests (dev) onlyThis package is using rpy2
in order to use import the cmprsk R packge and therefore the requirements for rpy2 must be met.
TL;DR
rpy2
. how to install on MacOS see also the following issuecmprsk
R library (open the R consule and run install.packages('cmprsk')
)For example usage consult the tutorial notebook in this repo: package_usage.ipynb
import pandas as pd
import cmprsk.cmprsk as cmprsk
from cmprsk import utils
data = pd.read_csv('my_data_file.csv')
# assuming that x1,x2,x3, x4 are covatiates.
# x1 are x4 are categorical with baseline 'd' for x1 and 5 for x2
static_covariates = utils.as_indicators(data[['x1', 'x2', 'x3', 'x4']], ['x1', 'x4'], bases=['d', 5])
crr_result = cmprsk.crr(data['ftime'], data['fstatus'], static_covariates)
report = crr_result.summary
print(report)
ftime
and fstatus
can be numpy array or pandas series, and static_covariates
is a pandas DataFrame.
The report
is a pandas DataFrame
as well.
import matplotlib.plt
import numpy as np
import pandas as pd
from cmprsk import cmprsk
data = pd.read_csv('cmprsk/cmprsk/tests/example_dataset.csv')
cuminc_res = cmprsk.cuminc(data.ss, data.cc, group=data.gg, strata=data.strt)
# print
cuminc_res.print
# plot using matplotlib
_, ax = plt.subplots()
for name, group in cuminc_res.groups.items():
ax.plot(group.time, group.est, label=name)
ax.fill_between(group.time, group.low_ci, group.high_ci, alpha=0.4)
ax.set_ylim([0, 1])
ax.legend()
ax.set_title('foo bar')
plt.show()
For running the unit tests run
pytest --cov=cmprsk cmprsk/tests/
from the project root. Note: you'll need to install pytest-cov.
Current coverage
---------- coverage: platform darwin, python 3.9.7-final-0 -----------
Name Stmts Miss Cover
----------------------------------------------------
cmprsk/__init__.py 0 0 100%
cmprsk/cmprsk.py 128 22 83%
cmprsk/rpy_utils.py 44 10 77%
cmprsk/tests/__init__.py 0 0 100%
cmprsk/tests/test_cmprsk.py 30 0 100%
cmprsk/tests/test_rpy_utils.py 27 1 96%
cmprsk/tests/test_utils.py 37 0 100%
cmprsk/utils.py 23 1 96%
----------------------------------------------------
TOTAL 289 34 88%