ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
203 stars 131 forks source link

🐛 Bug: Pandas slows down build time for ersiliaos/base image #1054

Closed DhanshreeA closed 3 months ago

DhanshreeA commented 6 months ago

Describe the bug.

The action building and pushing ersilia's base image (ersiliaos/base) takes disproportionately long to run. Although this could be due to any number of things, for example the runner having a modest amount of resources, however the primary culprit seems to be installation of pandas during ersilia install. It appears that instead of downloading a wheel, pip is downloading the source for this version of Pandas, and then pip takes forever to build a wheel for Pandas. As a consequence, what used to take <20 mins for building this image, now ends up taking ~1h26 minutes.

This is a low priority issue since our base image gets updated infrequently and building this image is not time critical but maybe we should look into why that's happening, and it could involve one or more of the following:

Describe the steps to reproduce the behavior

No response

Expected behavior.

No response

Screenshots.

No response

Operating environment

NA

Additional context

No response

miquelduranfrigola commented 6 months ago

Thanks @DhanshreeA - I would definitely upgrade pip as a first option.

GemmaTuron commented 6 months ago

I agree Pandas slows things down, I'd remove the dependency, as stated in the #928 :)

miquelduranfrigola commented 6 months ago

https://github.com/ersilia-os/ersilia/blob/376f886da37e3e45c2a0a0a71f700bc94b0deea1/ersilia/core/tracking.py#L3

Pandas here is only used for reading CSV files. We definitely need to use csv library and remove pandas dependency

DhanshreeA commented 3 months ago

1131 should resolve this hopefully