CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.35k stars 558 forks source link

robust=True estimation doesn't work on Ubuntu? #1253

Open konradsemsch opened 3 years ago

konradsemsch commented 3 years ago

Hi!

I've been using the LogLogisticAFTFitter with robust=True estimation on my Mac and I could see during during the procedure the following message coming from the logger that is attributes to this settings (at least I presume so, as I haven't seen it before without it):

2021-04-12 09:13:35,766 | INFO | utils.py | _init_num_threads | NumExpr defaulting to 8 threads.

also, the result of print_summary() clearly indicates that the SE show now different values as expected.

However, after packaging this code into a Docker image (Ubuntu), and running the container, I couldn't see the same logger output, and the SE were as if robust estimation did not take place.

Question: could it be that this setting silently fails on a different OS? Could you provide a bit more insights into this?

CamDavidsonPilon commented 3 years ago

Hi @konradsemsch - hm, I don't know what's going on here. I'm guess the difference between the Ubuntu and Mac is if numexpr is installed (lots of libs, like pandas, have an optional dependency on numexpr, and will use it if available in the OS). Can you try pip intstalling numexpr in the Docker image and rerunning? That would give me some hint as to what might be going on.

konradsemsch commented 3 years ago

Hi @CamDavidsonPilon! Ok, so I can come back with some further info. Adding numexpr to requirements indeed resolved the issue, but only partially.

  1. Now, during training in the container I see the following lines:

image

  1. When I run the model.print_summary() function locally on my mac, I also get a similar note and the SE seem reasonable:

image

  1. On the other hand, the same print_summary() function executed in the container in order to render online documentation doesn't provide the same results. As if the numexpr package wasn't used even though the same container with the same dependencies is applied.

Please note, that all comparisons were done on exactly the same trained model object, just the OS was different.

CamDavidsonPilon commented 3 years ago

Are you able to share the two print_summary results here?

konradsemsch commented 3 years ago

Hi @CamDavidsonPilon! Yes, let me report back on this:

So from the logs I can confirm that numexpr is activated in the backend when the model is trained in the container. When I download it and inspect the results with print_summary() locally I see the following:

Screenshot 2021-04-16 at 12 34 42

So from here you can already see that numexpr kicked in when the function was called.

And here's part of the summary which is executed on the container by sphinx, on the very same model object. Please notice that robust variance doesn't even show up in the top level summary. Please note that it's exactly the same object in the same jupyter notebook.

Screenshot 2021-04-16 at 12 31 17

I would also expect to see the same numexpr message when calling this function, but it doesn't show up. However, what's surprising is that it I can see it being used when invoking some other function using the model object:

image

Any clue what could be causing this? Would there be any way to enforce a consistent behaviour?