Clean UP & Dockerization eos8d8a

pittmanriley commented 1 year ago

Hi @GemmaTuron,

I'm having issues testing this model on CLI. It says that I'm getting a FileNotFoundError:

ersilia run -i "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"

Traceback (most recent call last):
  File "/Users/rileypittman/miniconda3/envs/ersilia/bin/ersilia", line 33, in <module>
    sys.exit(load_entry_point('ersilia', 'console_scripts', 'ersilia')())
  File "/Users/rileypittman/miniconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rileypittman/miniconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/rileypittman/miniconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rileypittman/miniconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/rileypittman/miniconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/rileypittman/miniconda3/envs/ersilia/lib/python3.10/site-packages/bentoml/cli/click_utils.py", line 138, in wrapper
    return func(*args, **kwargs)
  File "/Users/rileypittman/miniconda3/envs/ersilia/lib/python3.10/site-packages/bentoml/cli/click_utils.py", line 115, in wrapper
    return_value = func(*args, **kwargs)
  File "/Users/rileypittman/miniconda3/envs/ersilia/lib/python3.10/site-packages/bentoml/cli/click_utils.py", line 99, in wrapper
    return func(*args, **kwargs)
  File "/Users/rileypittman/ersilia/ersilia/cli/commands/run.py", line 34, in run
    result = mdl.run(input=input, output=output, batch_size=batch_size)
  File "/Users/rileypittman/ersilia/ersilia/core/model.py", line 410, in run
    self._run_logger.log(result=result, meta=self._model_info)
  File "/Users/rileypittman/ersilia/ersilia/core/model.py", line 460, in _model_info
    return self.info()
  File "/Users/rileypittman/ersilia/ersilia/core/model.py", line 455, in info
    with open(information_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/rileypittman/eos/dest/eos8d8a/information.json

However, I'm not sure why this is trying to use/open a file called information.json. Is it normal for a model to have an information.json file, and this model just doesn't have one? I apologize if this post is vague, but I'm really not sure what direction to take from here. Thank you.

miquelduranfrigola commented 1 year ago

Hello @pittmanriley , thanks.

The information.json file is created automatically at fetch time, so you should not add it.

I have tried the model like this: ersilia -v fetch eos8d8a --from_github and it worked perfectly for me. To go step by step, can you maybe first try this?

pittmanriley commented 1 year ago

@miquelduranfrigola

This worked. Thanks so much!

miquelduranfrigola commented 1 year ago

OK - so what is happening is that, without the --from_github option, the model is fetched either from S3 or DockerHub, and this may be fetching old versions of the model.

Let's see if you can now proceed...

GemmaTuron commented 1 year ago

@pittmanriley

the first thing to do when you get assigned a new model is fork the repo, clone the fork in your local system and then try out the fetch using the --repo_path option. Do not fetch it directly as anyway you need the repository cloned to make changes.

a reminder again, the models get fetched first from docker, then from s3 and then from github. The docker versions won't be correct in most occasions if it is a non refactored model, so when you get assigned a new model please follow the steps above.

pittmanriley commented 1 year ago

Thank you for your help @GemmaTuron!

I tested the model in CLI, Google Colab, and Docker. This model predicts the potential to permeate the Mycobacterium tuberculosis cell membrane based on physicochemical properties, so its output is a probability. Here are the results in the CLI using compound_list.csv:

 {
    "input": {
        "key": "LUHMMHZLDLBAKX-UHFFFAOYSA-N",
        "input": "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O",
        "text": "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O"
    },
    "output": {
        "probability": 0.097
    }
}
{
    "input": {
        "key": "QRXWMOHMRWLFEY-UHFFFAOYSA-N",
        "input": "C1=CN=CC=C1C(=O)NN",
        "text": "C1=CN=CC=C1C(=O)NN"
    },
    "output": {
        "probability": 0.191
    }
}
{
    "input": {
        "key": "SGOIRFVFHAKUTI-UHFFFAOYSA-N",
        "input": "CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O",
        "text": "CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O"
    },
    "output": {
        "probability": 0.044
    }
}

And here are the results using Google Colab on the eml_canonical.csv file: eos8d8a_output.csv

The model also worked in Docker:

# ersilia run -i "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
{
    "input": {
        "key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N",
        "input": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]",
        "text": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
    },
    "output": {
        "probability": 0.325
    }
}

GemmaTuron commented 1 year ago

Is there any change needed in the model or as it is is fine?

pittmanriley commented 1 year ago

@GemmaTuron I don't think any changes are needed, unless you think that it's an issue I had difficulty running it by fetching normally? Once I fetched using ersilia -v fetch eos8d8a --from_github, everything worked as expected.

GemmaTuron commented 1 year ago

Hello @pittmanriley

This model was incorporated two years ago and it does not have the required current eos-template structure, including a main.py, a run.sh etc. Hence it needs to be refactored completely, creating the necessary files, allocating the model checkpoints where they correspond and so on. If it is not clear what we mean by "refactoring the model" please write to me in Slack.

pittmanriley commented 1 year ago

This makes sense @GemmaTuron. I'll get the refactoring done soon.

Also, I wanted to write about a potential issue with this model that I discussed with @miquelduranfrigola since it is only available in AMD64. To fetch the model, I used the command ersilia -v fetch eos8d8a --from_github, which worked. Since this command will not show the image on docker, I had to run an additional command to do so: docker pull ersiliaos/eos8d8a:latest --platform linux/amd64

These steps work, but I think it is important for a user to know if these are the steps required, since they may also have an M1 or M2 chip.

pittmanriley commented 1 year ago

I also wanted to leave an update regarding the refactoring of this model. I deleted the .dvcignore file, adjusted the .gitattributes and service.py files, created a folder called code where I plan to add a main.py file, and added a run.sh file to the framework folder. To create the main.py file, I know I want to combine the two files in the framework folder: reorder_predictions.py and calculate_descriptors.py. However, I tried doing this, and ended up testing the changes and got an index out of range error related to a call in main.py where I define outfile = sys.argv[3]. I pushed the changes onto my forked version for now, and will continue to work on this. I still need to make adjustments to the service.py file, but this may be difficult.

GemmaTuron commented 1 year ago

Hi @pittmanriley

Thanks for the update, a summary of some things you can look into:

The checkpoints of the model are not in the checkpoints folder, but on the framework folder
.gitattributes is tracking a .pkl.gz file, which one, are the model checkpoints in that format?

The service.py file is calling three scripts:

            "bash {0}/calculate_descriptors.py {1} {2}".format(
                self.framework_dir,
                data_file,
                feat_file
            ),
            "bash {0}/mycpermcheck1.1 -i {1} > {2}".format(
                self.framework_dir,
                feat_file,
                pred_unsorted_file
            ),
            "bash {0}/reorder_predictions.py {1} {2} {3}".format(
                self.framework_dir,
                feat_file,
                pred_unsorted_file,
                pred_file
            )
        ]

But you have eliminated two of them? And the way of calling scripts: python - for .py files bash - for .sh files perl - for perl files

I'd adivse you to rebase your changes to the original version of the repository and think again on the logic behind the original commands on the service file:

            lines = [
                "python {0}/calculate_descriptors.py {1} {2}".format(
                    self.framework_dir,
                    data_file,
                    feat_file
                ),
                "perl {0}/mycpermcheck1.1 -i {1} > {2}".format(
                    self.framework_dir,
                    feat_file,
                    pred_unsorted_file
                ),
                "python {0}/reorder_predictions.py {1} {2} {3}".format(
                    self.framework_dir,
                    feat_file,
                    pred_unsorted_file,
                    pred_file
                )
            ]

I hope this pointer is helpful. See that we first calculate the descriptors, then run the actual model which is in a perl file and finally reorder the predictions to make sure we get the right output. Try to first understand these steps clearly before moving onto refactoring it

pittmanriley commented 1 year ago

Hi @GemmaTuron, I was advised by Miquel to only make some basic changes to this model for now. Specifically, only change the predict api in the service.py file to run and delete the unnecessary .dvc files, and revisit the model later on in the summer to make the rest of the changes. The reason is because the changes would be fairly complicated and the model currently works well. Please let me know if this is okay. If so, I'm ready to submit the PR.

GemmaTuron commented 1 year ago

Hi @pittmanriley

Yes, w ehave been dicussing with Miquel about this. Open a PR for the small changes, make sure to revise the .gitattributes file accordingly!

DhanshreeA commented 5 months ago

Good to close this issue @GemmaTuron?

ersilia-os / eos8d8a

Clean UP & Dockerization eos8d8a #6