ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
220 stars 147 forks source link

[Internship Project]: Hellen Namulinda #714

Closed GemmaTuron closed 1 year ago

GemmaTuron commented 1 year ago

Summary

Hello,

This is a public issue for a virtual daily stand-up. We will use this to briefly share the tasks of the day and the challenges and advances made, so that we can ensure smooth support from the Ersilia mentors and alignment between daily tasks and overall internship goals.

Scope

Initiative 🐋

Objective(s)

Internship goals:

Team

Role & Responsibility Username(s)
Intern @HellenNamulinda
Mentor @GemmaTuron
Coordinator @GemmaTuron

Timeline

Before starting your work, line up a few tasks and short description. This should not take long. For example, it could be something like: Wednesday 21st June

Documentation

No response

GemmaTuron commented 1 year ago

In this example, following the tasks of Wednesday 21st June:

I have created all the templates for the Interns, spend a couple of hours revising the GitHub Project and updating the tasks. I have been working on identifying the bug in the GitActions, solved in this issue. I have set a meeting with @miquelduranfrigola to discuss in detail his comments on the Model testing discussion but I haven't been able to start writing yet, i am to do so by the end of the week. Looking forward to the WCAIR next lesson!

Pro tip: adding the links to the issues and discussions you mention will be very helpful!

HellenNamulinda commented 1 year ago

Wednesday 21st June: Tasks

Model eos31ve whose changes were merged, was tested and feedback shows it's working. Though its arm64 build failed, then amd64 was successful. Docker images are quite big(4GB +) and my limited internet hindered the testing.

HellenNamulinda commented 1 year ago

Thursday 22nd June: Tasks

I had slow internet issues and could not fetch the models pending testing. But I'm working around my network and will have these tested by the end of the weekend.

HellenNamulinda commented 1 year ago

Friday 23rd June: Tasks

I'm working to resolve the pending tasks by the end of the weekend. I tested models eos9tyg and eos2r5a which were intially not allowing me to pass an output file. They all worked with output files. Model eos526j and modeleos7pw8 work well using Google Colab, but they gave null outputs when using CLI and Docker.

For model eos65rt, I was able to ressolve the package dependency conflicts and created a PR.

HellenNamulinda commented 1 year ago

Monday 26th June: Tasks

Testing model eos2re5 using Colab failed, it was requiring shells to be activated. The sudo run commands also required root access when using CLI, and even after providing the password, it was taking too long to fetch the model.

The first PR I made for refactoring model eos65rt didn't capture all the commits for updating api, and it was failing at testing using ersilia run. I pushed the changes for updating api to run.

Currently, ersilia crashes when given invalid smiles(wrong inputs). Instead of continuing to predict for the correct ones, the execution halts. The exception isn't handled correctly. And since most models, don't have a way of running predictions for only right smiles, this has to be handled by ersilia such that when making inferences, only valid smiles are passed. For the wrong inputs, their prediction value should a message of say "Invalid smile".
I setup a VM to test proposed changes without affecting my normal ersilia environment.

HellenNamulinda commented 1 year ago

@GemmaTuron, the pending issue I'm working on is the presentation and solution to

HellenNamulinda commented 1 year ago

Tuesday 27th June: Tasks

I updated model eos2gth and pushed the changes. There were some errors when running workflows, but these were corrected by Miquel because they were originating from ersilia. On checking model eos24jm, it was up to date because it was incorporated recently. Its images were also available on dockerhub. So, I just tested it and it was functional. I tested model eos97yu using Colab, CLI and Docker. It was working well. For model eos7pw8, changes were made but they were not yet reflected on dockerhub. I will test it again.

The proposal for dealing with wrong inputs was accepted. And this feature will be incorporated in ersilia with Miquel leading the coding session on Wednesday, June 28 at 6pm CET.

HellenNamulinda commented 1 year ago

Wednesday 28th June: Tasks

GemmaTuron commented 1 year ago

@HellenNamulinda I've added a second model in case you finish all the tasks!

HellenNamulinda commented 1 year ago

Thursday 29th June: Tasks

GemmaTuron commented 1 year ago

Hi @HellenNamulinda !

See my comment on model es3sa2, I am afraid is a work in progress model that was never finished, I am sorry for this!

HellenNamulinda commented 1 year ago

Hello @GemmaTuron, Oh yes, I have read the comment. When we are done with cleaning the other models, I will be glad to investigate the model and if possible re-incorporate it.

I will first continue to the next model eos4avb.

Also, when I convert the suggested task to issue, it unassigns me, forexample model eos6hy3. Hope you will be handling that.

GemmaTuron commented 1 year ago

Hi @HellenNamulinda I've fixed the task to issue thing !

HellenNamulinda commented 1 year ago

Hello Gemma, Thank you! Due to power issues today, I'm handling the tasks late. But I will go through all my pending tasks before next week.

HellenNamulinda commented 1 year ago

Hi @GemmaTuron, Monday 3rd July: Tasks

For model eos4avb, it involved changing rdkit be installed using pip instead of conda so that the build for arm64 can be successful. For model eos7asg, it requires installing java-jre using conda. But this needs a very low version of conda for the installation to work. So, commands to first downgrade conda, and update it after installing packages were added.

HellenNamulinda commented 1 year ago

Hi @GemmaTuron, Tuesday 4th July: Tasks

GemmaTuron commented 1 year ago

Hi @HellenNamulinda !

Good, I've assigned you a new model in case you finish with debugging the eos7asg.

HellenNamulinda commented 1 year ago

Sure, I will work on it.

GemmaTuron commented 1 year ago

Hi @HellenNamulinda

I have assigned you a new model in case you are dine with the eos2thm, but don't worry if you don't get to it today.

HellenNamulinda commented 1 year ago

Wednesday 5th July: Tasks

I spent time comparing rdkit versions for model eos2thm. I set up an environment with packages specified in the original repo. This forced me to first downgrade conda inorder to install rdkit 2019 with 200 descriptors(messes up other packages). I got the 8 descriptors that were added to versions with 208 descriptors that are missing in 2019.03

Comparing the results of model; It so happened that all all rdkit versions with 200+ descriptors gave the same. Yet to find out why :open_mouth:

GemmaTuron commented 1 year ago

@HellenNamulinda

How are you on the tasks today? I think you still have a coupe of models for refactoring?

HellenNamulinda commented 1 year ago

Hi @GemmaTuron, Yes I have to finish these two models for refactoring. Plus testing the one pending testing after changes by Emma. I'm done with eos8a4x but finalizing with local testing. With the comment I just added on model eos2thm. I just need your go ahead.

HellenNamulinda commented 1 year ago

Hello @GemmaTuron

Thursday 6th July: Tasks

The original model code for eos2thm had a file molfeaturizer.py where the 200 descriptors were explicitly specified. tha's why using rdkit versions with 208 doesn't change model output.

GemmaTuron commented 1 year ago

Hi @HellenNamulinda !

I think we can safely merge the PR on eos2thm. I've assigned you two new models!

HellenNamulinda commented 1 year ago

Hi @GemmaTuron

Friday 7th July: Tasks

HellenNamulinda commented 1 year ago

Hi @GemmaTuron

Monday 10th July Tasks

All the models tested(eos3ae6 and eos1amr) work well using Colab and Docker. model eos1amr despite working well with string inputs on CLI, it was raising a TypeError: object of type 'float' has no len() for file outputs. Model eos3ae6 works well using the three.

Done with refactoring model eos4u6p and created a PR. For model eos7a04, I'm still fixing package version conflicts.

HellenNamulinda commented 1 year ago

Hi @GemmaTuron

Tuesday 11th July Tasks

The two models tested(eos6m4j and eos24ci) work well using Colab, CLI and Docker. Though model eos6m4j returns null for some smiles on CLI(for a single string and when part of a file) yet it works well using Colab and Docker.

GemmaTuron commented 1 year ago

Thanks for the update @HellenNamulinda

There is still the issue in eos6m4j where some mols cannot be predicted? Also please confirm eos7w6n works now I'll assign you new models meanwhile.

HellenNamulinda commented 1 year ago

Hi @GemmaTuron

Wenesday 12th July Tasks

Model eos7w6n works well using Colab, CLI and Docker. However, for model eos2re5, the first value in the output(its column name: smiles) is null. Probably because it's a string yet output type is Float.

HellenNamulinda commented 1 year ago

Hi @GemmaTuron

Thursday 13th July Tasks

GemmaTuron commented 1 year ago

@HellenNamulinda

Great, I will not be assigning new models since you have a few open already, good work.

HellenNamulinda commented 1 year ago

Hello @GemmaTuron, Sorry for the late report. I thought I had added the Friday tasks already.

Friday 14th July Tasks

HellenNamulinda commented 1 year ago

Hi @GemmaTuron

Monday 17th July Tasks

The model eos2lm8 makes predictions well using Colab, CLI and Docker. The only concern is that, output values for the same smiles are not consistent on different runs(same platform) and across the three platforms.

For model eos9f6t, the major error was related to the incompatibility of tensorbodX and protobuf. This is because Bentoml depends on protobuf<3.19,>=3.8.0. But initially installing chemprop==1.3.0, was installing tensorboardx 2.6.1 since the compatible version for it was not specified and it caused a dependency error tensorboardx 2.6.1 requires protobuf>=4.22.3, but you have protobuf 3.18.3 which is incompatible. So, I had to specify the tensorboardX==2.0 because it is compatible with chemprop 1.3 and protobuf 3.18.3 The only issue with the model is that, while run.sh returns consistent values, its output values when served in ersilia for the same smiles are not consistent on different runs.

With model eos69p9, it works well locally and output values are consistent.

Model eos43at uses rdkit 2019.3.3 which has 200 descriptors, and rdkit 2020+ versions don't have the same number. I'm still working on the best way to install it(outdated for new conda versions, and also not available on pypi) while maintaining the model output. I'm testing after downloading the files using wget. instead of downgrading conda.

HellenNamulinda commented 1 year ago

Tuesday 18th July: Tasks

HellenNamulinda commented 1 year ago

Thursday 20th July: Tasks

When testing eos7pwe, the error persisted when fetching from S3/github and repo_path(locally).

ERROR: The certificate of ‘anaconda.org’ has expired.
#8 ERROR: process "/bin/sh -c wget https://anaconda.org/LICH/syba/1.0.2.alpha/download/noarch/syba-1.0.2.alpha-py_0.tar.bz2" did not complete successfully: exit code: 5

I will try to explore more to come up with a solution.

HellenNamulinda commented 1 year ago

Hi @GemmaTuron,

Friday 21st July: Tasks

I spent some time resolving ERROR: The certificate of ‘anaconda.org’ has expired. with model eos7pw8 when using CLI to fetch from S3, github and repo_path(locally). The model was using Mode: docker when installing packaes, I configured the default to conda. All details are explained in the comment. Another issue with the model was TypeError: object of type 'float' has no len() for csv file outputs. More explanation on how it was resolved here

Also, to extract assays from the ChEMBL database that will be used to test the EnsembleTabPFN package, I successfully set up the chembl_ml_tools package, including a postgres database server containing the ChEMBL database. I installed the latest ChEMBL database.

I will now explore model eos43at more by testing it on Codespaces. Plus refactoring model eos6fza

GemmaTuron commented 1 year ago

@HellenNamulinda

Thanks, I won't assign new tasks so you can focus on the current ones!

HellenNamulinda commented 1 year ago

25th July: Tasks

HellenNamulinda commented 1 year ago

26th July: Tasks

HellenNamulinda commented 1 year ago

27th July: Tasks

GemmaTuron commented 1 year ago

@HellenNamulinda

I think you have still quite some models pending to be refactored, which is probably my fault assigning too many at once. Since you are also busy with the ChEMBL data, would you tell me which models you have not started to work on, so I might re-assign them and free up some of your tasks?

HellenNamulinda commented 1 year ago

Hi @GemmaTuron, It's my fault for not updating the issues for some days now. Apologies. Let me ensure to complete all the pending by week's end.

HellenNamulinda commented 1 year ago

Tuesday 1st August: Tasks

HellenNamulinda commented 1 year ago

Wednesday 2nd August: Tasks

Model Testing:

Model Refactoring

HellenNamulinda commented 1 year ago

Thursday 3rd August: Tasks

GemmaTuron commented 1 year ago

@HellenNamulinda

Are you working in any other model aside from the ChEMBL data?

HellenNamulinda commented 1 year ago

@GemmaTuron, There is no other model.

GemmaTuron commented 1 year ago

Perfect, let's use today's meeting to focus on the ChEMBL data then

HellenNamulinda commented 1 year ago

Tuesday 8th August: Tasks

GemmaTuron commented 1 year ago

Hi @HellenNamulinda

In addition to the model testing and working on model eos96ia to help Riley, please: