ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
220 stars 147 forks source link

🐈 Task: Ersilia still trying to use Airtable while deciding how to fetch a model #1259

Closed DhanshreeA closed 1 month ago

DhanshreeA commented 1 month ago

Summary

We need to get rid of PyAirtable as a hard dependency, however a major workflow within how ersilia fetches models still ends up using the AirtableInterface within ersilia. This happens in this line when ersilia goes through the flow of deciding how to fetch the model . This is problematic because majority of our models end up being fetched from DockerHub, however ersilia still goes through this entire workflow, slowing things down, and bloating the ersilia environment by installing PyAirtable (+ its own dependencies). For reference, for models that are not hosted, we still end up checking for a URL first in the JSON maintained in S3, and then on Airtable.

The simplest workaround is to change the sequence of steps from:

        do_hosted = self._decide_if_use_hosted(model_id=model_id)
        if do_hosted:
            self.logger.debug("Fetching from hosted")
            self._fetch_from_hosted(model_id=model_id)
            return
        do_dockerhub = self._decide_if_use_dockerhub(model_id=model_id)
        if do_dockerhub:
            self.logger.debug("Decided to fetch from DockerHub")
            self._fetch_from_dockerhub(model_id=model_id)
            return

to

        do_dockerhub = self._decide_if_use_dockerhub(model_id=model_id)
        if do_dockerhub:
            self.logger.debug("Decided to fetch from DockerHub")
            self._fetch_from_dockerhub(model_id=model_id)
            return
        do_hosted = self._decide_if_use_hosted(model_id=model_id)
        if do_hosted:
            self.logger.debug("Fetching from hosted")
            self._fetch_from_hosted(model_id=model_id)
            return

However since we do not have many "hosted" models, I feel we should revisit if we need this functionality entirely, and if it should be more opt-in than the default.

To tackle this issue and make sure you have achieved the desired functionality, ie, Airtable is not required by Ersilia, do the following:

Step 1. Install Ersilia in a new environment and check that this env doesn't have the pyairtable library in it.

Step 2. Fetch a model simply as ersilia -v fetch eos3b5e Step 3. You will notice that Ersilia tries to install pyairtable Step 4. After completing objective 1, ie, making do_docker evaluate before do_hosted, create a new environment and install ersilia in it using this code. Step 5. Delete the previously fetched model, otherwise ersilia will refuse to fetch it. Achieve this as ersilia delete eos3b5e. Step 6. Fetch this model again, and check if you have the pyairtable library getting installed in the environment.

Objective(s)

Documentation

No response

DhanshreeA commented 1 month ago

What do you think @miquelduranfrigola?

miquelduranfrigola commented 1 month ago

Good points. I am answering real quick, apologies 🙏 ... The from_hosted functionality is important since we were using it in the apps (this has changed now, though, but I feel we need to allow users to use hosted models from the Ersilia CLI). I am a bit unsure about changing the order, but we can discuss! This said, why don't we try to remove the AirtableInterface dependency from the inner works of fetch_from_hosted?

DhanshreeA commented 1 month ago

It is not fetch_from_hosted that first calls the AirtableInterface within ersilia, it is in fact _decide_if_use_hosted that calls it because it tries to look for a URL. I know changing the order sounds like a very naive solution however due to the fact that most of our models are in fact dockerized, and that is generally how ersilia ends up fetching them anyway, it makes sense to have ersilia go through the _decide_if_use_dockerhub check first. It saves us a bunch of function calls across most models, as well as does not force airtable to be installed every time despite our best efforts to get rid of it.

GemmaTuron commented 1 month ago

@DhanshreeA is this a good-first-issue? If so provide more details of what we want exactly so contributors can tackle it

daud99 commented 1 month ago

@DhanshreeA

DhanshreeA commented 1 month ago

@daud99 Go ahead!

DhanshreeA commented 1 month ago

Hi @daud99 could you mention this issue in your PR as "Closes # 1291" (Remove the space)

daud99 commented 1 month ago

okay @DhanshreeA