giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
848 stars 173 forks source link

Caching in CI #271

Closed rth closed 2 years ago

rth commented 4 years ago

One way to make CI faster could be cache installation or compilation objects with ccache in CI.

Cf https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops for more details.

I will look into it.

Another way mentioned earlier is to pre-build containers or VMs used in CI (https://docs.microsoft.com/en-us/azure/devops/pipelines/process/container-phases?view=azure-devops). This could be a second option, if caching is does not help enough. It would require a bit more work IMO, and would be slower to add changes to later.

rth commented 4 years ago

A question: when one builds boost, I see that it needs pyconfig.h. Does that mean that boost has has to be compiled for each python version (or alternatively rebuild if python version changes)? Or would building it for a single version would be enough? cc @ulupo @gtauzin

ulupo commented 4 years ago

@rth I'm not sure unfortunately, it is the first time I see this. Are you seeing it from the CI, or simply from looking at the boost documentation?

rth commented 4 years ago

After looking into manylinux builds which as the slowest, I saw that boost installation takes indeed quite a long time as discussed. One way around as mentioned by @gtauzin is to create a specific docker image with boost installed. ccache is not ideal for this, as is more focused on speeding up project build itself, not so much it's dependencies, and it would not help with download and configure time.

However creating a specific Docker images makes sense only if we can reuse it for all python versions, and if I locally build boost in the manylinux2010 image without specifying the python version I get errors similar to,

gcc.compile.c++ bin.v2/libs/python/build/gcc-8.3.1/release/python-2.6/threading-multi/visibility-hidden/converter/arg_to_python_base.o 
In file included from ./boost/python/detail/prefix.hpp:13,                                                                             
                 from ./boost/python/handle.hpp:8,                                                                                     
                 from ./boost/python/converter/arg_to_python_base.hpp:7,                                                               
                 from libs/python/src/converter/arg_to_python_base.cpp:6:                                                              
./boost/python/detail/wrap_python.hpp:50:11: fatal error: pyconfig.h: No such file or directory                                        
 # include <pyconfig.h>                                                                                                                
           ^~~~~~~~~~~~
compilation terminated.

Anyway I'll try to build it for a specific Python version and see how it goes when used in CI.

Edit: In the end, it looks like boost.python is python version dependent, and so it would require rebuilding boost for a specific python version.

ulupo commented 4 years ago

@rth, we should be able to get @MonkeyBreaker and @reds-heig's insight into this this week.

MonkeyBreaker commented 4 years ago

Hi @rth,

Nice work on the caching, I'm not a CI guru myself, I really appreciate the work done on the matter. As you found, pyconfig.h is python version dependant. But, the features depending on boost do not change often (at least at the moment). I can have a look if we could reuse, part of previous build that did not change. I'm not sure it's even possible, but if this allow us to gain time on CI, it could be interesting.

Have a nice day, Julián

rth commented 4 years ago

Thanks for the feedback @MonkeyBreaker !

For the record, I did try to reuse the boost installation with a pre-built Docker container in https://github.com/giotto-ai/giotto-tda/compare/master...rth:manylinux2010-container?expand=1 before running in the issue with pyconfig.h being version dependent. Anyway with ccache enabled now this would wouldn't improve runtime significantly I think (since currently testing examples is one of the slowest part).