Closed yunwezhang closed 3 years ago
Hi @yunwezhang, after walking through the example on random survival forest in sksurv, I think the biggest problem on using deep forest in survival analysis tasks is how to design good augmented features. In survival analysis, our main concern is the survival predicting function that takes time steps t
as the input, right? For now, I cannot figure out how to ingest this into the cascade structure of deep forest.
Since we are not quite familiar with survival analysis, your suggestions would be highly welcomed ;-)
EDIT: We are happy to work on this feature request if this is achievable.
Hi Yixuan,
Yes, you are right about the time steps, the input part of survival models requires a 2-dim thing as the outcome (time+binary status, where this binary means censored or not) but the output is usually a 1-dim vector, either "risk" or "probability" (as in binary classification).
As for the augmented feature steps, i assume you are talking about this part in the model structure? Is this part corresponding to this part in the paper? Because to me, if I understand correctly, in the cascade forest part, the augmented features (in-model feature transformation) obtained from each forest are the predicted vectors, which can be obtained from a survival forest (the output survival probability). However, I am not clear about the attached picture part. (I think the 2019 paper has it because it is better for image data....)
Thank you for looking into it and I am not sure how hard it is to add the random survival model. I am happy to chat with you to see how it goes. In summary, the change for the input data needs to be X (n by p), y (both time and status) and the output is probability vector (could be survival risk, 1 year survival probability, 2 year survival prob, etc.) 😊
Thanks for your kind explanations @yunwezhang.
As for the augmented feature steps, i assume you are talking about this part in the model structure? Is this part corresponding to this part in the paper?
No, the second figure posted by you shows the multi-grained scanning part, which is not included in this package, since tree ensembles are typically not the best choice for structured data such as images or audios. Augmented features refer to part of the input for hidden cascade layers. For classification, they are predicted class probabilities; For regression, they are predicted target values.
Here are three questions that I would like to ask further.
X
, we also need to enroll an indicator array on time and status.Hi Yixuan,
Thanks for the fast reply. I am aware that the multi-grain scanning is not included and that's why I asked why do you have the part (first figure) in your model structure instead of starting from the cascade forest.
Answer for the further questions:
Thanks for the fast reply. I am aware that the multi-grain scanning is not included and that's why I asked why do you have the part (first figure) in your model structure instead of starting from the cascade forest.
The binner in that figure is used to reduce the number of splitting candidates for the sake of acceleration (not used in the original deep forest model). The entire architecture does correspond to the cascade forest structure.
Besides, I have opened up a feature request in sksurv (link), deep forest could benefit from using a mixture of RandomSurvivalForest
and ExtraSurvivalTrees
in cascade layers. Let's wait for the response from maintainers of sksurv before formally working on this feature request ;-)
got it! yes, let's wait for the reply. To have that extra injection of randomness, it would be better to have ExtraSurvivalTrees.
Realizing that we can implement ExtraSurvivalTrees
by importing sksurv
as a soft dependency, I think we could work on this feature request without extra helps from that community.
Thank you for looking into it and I am not sure how hard it is to add the random survival model. I am happy to chat with you to see how it goes. In summary, the change for the input data needs to be X (n by p), y (both time and status) and the output is probability vector (could be survival risk, 1 year survival probability, 2 year survival prob, etc.) 😊
If you are interested in extending deep forest to the field of survival analysis, could you contact me through an e-mail (Address), so that we can have more discussions before opening a draft PR on this feature ;-)
Closed via #14.
Hi maintainer,
I am wondering is that possible to cascade random survival forest (maybe a sksurv model) instead of RF in your deep forest model? As in #48, it seems that the supported model types are classification and regression. (or did I miss some parts of those tutorial docs?)
Thanks.