ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
220 stars 147 forks source link

[Internship Project]: Zakia Yahya #713

Closed GemmaTuron closed 1 year ago

GemmaTuron commented 1 year ago

Summary

Hello,

This is a public issue for a virtual daily stand-up. We will use this to briefly share the tasks of the day and the challenges and advances made, so that we can ensure smooth support from the Ersilia mentors and alignment between daily tasks and overall internship goals.

Scope

Initiative 🐋

Objective(s)

Internship goals:

Team

Role & Responsibility Username(s)
Intern @ZakiaYahya
Mentor @DhanshreeA
Coordinator @GemmaTuron

Timeline

Before starting your work, line up a few tasks and short description. This should not take long. For example, it could be something like: Wednesday 21st June

Documentation

No response

GemmaTuron commented 1 year ago

In this example, following the tasks of Wednesday 21st June:

I have created all the templates for the Interns, spend a couple of hours revising the GitHub Project and updating the tasks. I have been working on identifying the bug in the GitActions, solved in this issue. I have set a meeting with @miquelduranfrigola to discuss in detail his comments on the Model testing discussion but I haven't been able to start writing yet, i am to do so by the end of the week. Looking forward to the WCAIR next lesson!

Pro tip: adding the links to the issues and discussions you mention will be very helpful!

ZakiaYahya commented 1 year ago

@GemmaTuron @DhanshreeA

Today Tasks List: Wednesday, June 21, 2023

Model refactoring eos2r5e is still under work, try to complete it as soon as possible. Done testing of model eos31ve on CLI and COLAB, testing on DockerHub rightnow. Thanks.

GemmaTuron commented 1 year ago

Great thanks Zakia, I am giving you an extra model just because won't be able to review until my morning tomorrow and you probably start working earlier than I do in your timezone. Just in case you finish your previous tasks

GemmaTuron commented 1 year ago

Hi @ZakiaYahya

Please look at this issue. There is so much more information you can add when a workflow is failing, I have written an example of the level of detail you should be aiming for, I hope this is helpful.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Today Tasks List: Thursday, June 22, 2023

Due to slower network today, my work has been affected. I have done refactoring of model eos2re5 , it is working both locally and inside ersilia --repo-path, but unable to push changes till now due to internet. I'll open PR as soon as i successfully pushes the changes in the repo. Now, working on model refactoring eos2b6f, try running it locally but encountering errors, working on it. Once i get why the error is happening i'll let you know in detail. Trying to do model testing eos5505 simultaneously. Able to run it on colab somehow but due to unstable internet it is taking way too long in fetching from ersilia. Once it done i'll post the results in the relevant issue. I hope my internet get stable by tomorrow. Thanks. Thanks.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Friday, June 23, 2023

Will try to work on ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device on weekend.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya !

Thanks, make sure to eliminate all the other Ersilia Models you have been working with running the ersilia delete command, and revise your conda envs as well after doing this. It will free up space in your system

ZakiaYahya commented 1 year ago

Hello @GemmaTuron Right, i'll delete it all by today. Thanks.

DhanshreeA commented 1 year ago

Hi @ZakiaYahya I went through your update on eos44zp. Your PR has been simply closed (and not merged) since the model was fairly recent and only required the missing workflow files. Gemma has added them in a separate commit.

Regarding eos2re5, I have left comments on the issue in that repository. The issues with docker build seem to be coming from using conda command through Docker's RUN directives. I will also have to look more into that - but for the time being I have linked a resource that should be useful in understanding what is going on. Also, the model isn't fully cleaned and there are a few more files that should be removed.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Monday, June 26, 2023

I have done model testing and Re-refactor model eos2b6f again with newer version of RDkit that is compatible. I have open PR on it. I'm currently working on model eos2re5that is failing at after merging as well going through model eos44zp "Space allocation problem" as well but still didn't able to resolve it. Thanks.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya ! the space problem is related to GitHub actions, can you check for today's meeting what are the limits on Git Actions and we'll discuss what to do?

ZakiaYahya commented 1 year ago

Hello @GemmaTuron Yes, i'm working on it. Different forums mention different cache sizes but most of them mentioned that they are now offering Upto 10Gb for workflow actions. I've gone through different forums, we will discuss it in today's meeting

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @miquelduranfrigola @DhanshreeA After going into detail, i found three solutions to either free up pre-cached tools or by using swap space technique and i'll discuss it in today's general meeting these three workarounds (1) Removing pre-cached tools from the github runner: https://github.com/actions/runner-images/issues/2840#issuecomment-790492173 (2) Adding Swap-Space: https://github.com/pierotofy/set-swap-space (3) Deleting Pre-cached tools + Swap-space (by doing this available memory is around ~31GB):https://github.com/jlumbroso/free-disk-space/releases/tag/v1.1.0

Thanks

miquelduranfrigola commented 1 year ago

This is great stuff, @ZakiaYahya - I have never explored these options. Let me invoke @honeyankit and @GrantBirki from GitHub, let's see if they have additional feedback.

ZakiaYahya commented 1 year ago

Thank @miquelduranfrigola Yeah sure, it would be very helpful.

GrantBirki commented 1 year ago

Hey there! I have never explored these options either so I would be curious to follow along with your PRs and see how it all works!

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Tuesday, June 27, 2023

Try to search more workaround on Out-of-Memory problem in github actions. Will work on new model refactoring eos46evtomorrow. Model eos2v11 is still under working, so once it done, i'll test it. Thanks.

miquelduranfrigola commented 1 year ago

Thanks @ZakiaYahya - awesome progress

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Wednesday, June 28, 2023

Model eos2re5: After changes and open PR again, it still fails at "Upload to DockerHub" but now it's not giving previous error i.e. CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'., now it's giving different error but still related to conda i.e. 'CondaEnvironmentService' object has no attribute 'pid'. Still working on this.

Didn't able to do some work on Out-of-Memory issue in model eos44zp, will do it once done with model refactoring eos46ev. Thanks.

GemmaTuron commented 1 year ago

@ZakiaYahya !

Great progress, a lot on your plate so don't worry, focus on the issues one by one. If you need help with the eos46ev ping me and I'll try to provide further guidance I won't be assigning new models to you so you can focus on the eos44zp

ZakiaYahya commented 1 year ago

Thanks @GemmaTuron I'll try to first test eos46ev with latest ersilia version. Then i'll let you know. It seems like @febie is also encountering the same problem as i encountered in eos46ev. Regarding model eos44zp, i'm thinking of trying swap-space workaround rather than deleting pre-cached tools as it deletes some of packages that we required like python.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Thursday, June 29, 2023

Tested eos46ev for various single smile inputs updated here. Now i'm focusing on above mentioned three unresolved issues. Thanks.

GemmaTuron commented 1 year ago

Thanks for the update @ZakiaYahya ! I'll test eos46ev on my side as well

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Friday, June 30, 2023

Open PR again on modeleos44zp with added swap-space. Working on conda-issue in model eos2re5. Let me know @GemmaTuron when you test single inputs for model eos46ev, i've kept it in hold for now. Will start working on model refactoring eos6pbf once done with conda-issue. Thanks.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Monday, July 3, 2023

Did model refactoring eos6pbf and open PR on it today. Modified test-model.yml for model eos44zp as discussed and open PR on it. Tested model eos7asg and update it on relevant issue. Working on model investigation eos46ev in detail, along with working on conda-issue in model eos2re5 as well. Thanks

GemmaTuron commented 1 year ago

Hi @ZakiaYahya

The space in git actions is not easy to solve, let's try to talk about it today in the meeting. Focus for today on investigating the eos46ev and the conda issue on eos2re5 and if you are done with all let me know!

ZakiaYahya commented 1 year ago

Right @GemmaTuron, I'm working on it. Will let you know once it done. Thanks.

DhanshreeA commented 1 year ago

Sorry for hijacking the conversation here a bit, but @GrantBirki could you tell us how much persistent storage space do GitHub runners get on our plan? I couldn't find much that says anything about it. I ask because configuring any amount of swap space would be limited by how much storage we actually have. And can we increase it, if yes, how? And is it possible to increase it specifically for a single model repository and not all of them?

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Tuesday, July 4, 2023

Working on model eos2re5, did all changes suggested by Miquel in today's meeting, now testing it both locally and inside ersilia using --repo_path. Tested it with smaller input file, now testing it with whole eml_canonical. Meanwhile working on model eos46ev as well, investigating why it is not working with eml_canonical.csv. Open model request on one of the CYP450 enzyme i.e. CYP2CP, once approved, i'll start working on it as well. Thanks.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya

Quite a lot of work on refactoring eos44zp and debugging eos2re5, focus on this before moving onto new tasks

ZakiaYahya commented 1 year ago

Right @GemmaTuron Working on it.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Wednesday, July 5, 2023

I've done testing model eos2re5 after suggested changes, it is working bothlocallyand with --repo-path. Done refactoring Model eos5jz9 (CYP2CP), it is working locally but with --repo-path it is failing throwing ModuleNotFoundError: No module named 'sklearn', i'm working on it, once it is working, i'll open PR on it. Apart from that, i'm also side by side working on model eos46ev but didn't able to resolve it yet. It seems like it is failing on some of the smiles of eml_canonical, figuring it out which smiles causing the problem. Thanks.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya

Thansk for the update, let me know if you need anything. I've merged the pr on eos2re5

ZakiaYahya commented 1 year ago

Sure @GemmaTuron Working on separate models for CYP right now.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Thursday, July 6, 2023

Done incorporating model eos5jz9and eos7nno and opened PR as well on these models. Tested model eos7asg on CLI, COLAB and DockerHub. Working on 3rd model incorporation eos3ev6 but getting error while running it with --repo_path, Once it done i'll start working on eos46ev. Thanks.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya

I have answered you in Slack, it seems there is an emoji causing trouble? that is surprising but it seems the reason! your plan of work looks good!

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Friday, July 7, 2023

Done model incorporation eos3ev6 (CYP3A4) and open pull request. All three models divided from model eos44zp are incorporated in Ersilia Model Hub. What to do next with eos44zp model then??

After digging into the detail of model eos46ev, i separated out smiles that are causing problem at prediction time. They are 7 in number out of 443 from eml_canonical. Working on it to identify NAN or infinity values and handle them appropriately. What should be more convenient; Discarding problematic inputs or replacing NAN/infinity entries with zeros??

Model eos2re5 again failed at "upload to docker Hub", I've discussed it with miquel, we will figure it out on monday in detail.

I've tested model eos4se9 and it is working fine. Model testing eos2thm is still in loop, will do it later. Thanks

miquelduranfrigola commented 1 year ago

Hi @ZakiaYahya

Fantastic work as always. Let me quickly answer about eos46ev. We should not discard problematic inputs. That is, we need to ensure that we always have the same number of inputs as outputs.

ZakiaYahya commented 1 year ago

Hello @miquelduranfrigola Thanks. Yes, to ensure same number of output entries as input entries i'm not skipping them, i'm just replacing NAN with zeros.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron Kindly merge PR on model eos3ev6 https://github.com/ersilia-os/eos3ev6/pull/1 Thanks.

GemmaTuron commented 1 year ago

sure! I'm waiting for the checks to be completed

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Monday, July 10, 2023

Model eos46ev: It is working absolutely fine locally with run.sh, giving output probabilities for all inputs, dealing with smiles having NAN values as well by replacing that NAN values with zeros. But somehow it is behaving weird when test it within Ersilia using --repo-path. It is skipping some of inputs and returing remaining smiles and it's corresponding probabilities. I get those smiles which are skipped and strangly those are not even those smiles that have NAN values. Still can't figure out the problem. Need help here.

Model eos2re5: Discuss it with Miquel and he suggests doing changes in Ersilia code baseas this model uses sudo commands in dockerfile which ultimately continuously failing with Ersilia. So, Miquel did some changes in Fetch-> get.py code to discard sudo automatically from commands for root-users. I've push the changes in Ersilia code base and opened PR on it, Miquel is now testing the github actions. Hope it will works.

Mode Testing: Done quite a lot of model testing today, one model i.e. eos59rr is still in the loop. Thanks

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Tuesday, July 11, 2023

Model eos2re5: Miquel did some changes in the dockerfile and Ersilia code space to make it works, Github Actions are running, Hope it will upload to DockerHub this time as it works accordingly to the changes did in it.

Model eos46ev: I tested it even with the commits even i refactoring the model and it is skipping some of the smiles at prediction time. So, it means this problem didn't arises after refactoring. Although Miquel dig into it and finds out that smiles are not reading properly and he did some changes in the main.py for reading smiles. Although the model passes all Github Action workflows but i just tested it by fetching the latest code from Ersilia and it is skipping a lot of smiles at prediction time.

Model Testing: Did Model testing today for models eos59rr and 24ci. Thanks

GemmaTuron commented 1 year ago

Thanks for the update @ZakiaYahya

The new cyps models are all ready, just awaiting for final test. We still need to think more about the issue with eos46ev. I've assigned you two new models meanwhile

ZakiaYahya commented 1 year ago

Alright @GemmaTuron Yeh i skipped working on model eos46ev for a while, waiting for you to test. For today, i start working on new models. Thanks.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Wednesday, July 12, 2023

Model eos4tcc: Quiet a easy-pesy model, didn't require much refactoring. Tested it before and after refactoring with both run.sh and --repo-path and it is working fine. I've open PR on it as well.

Model eos1579: Start with testing the model before doing any refactoring, it is working fine locally with run_predict.sh but it is failing with --repo-path while fetching. I've updated the issue, kindly have a look https://github.com/ersilia-os/eos1579/issues/1#issuecomment-1632852573

Model eos46ev: Done quite a lot of testing from yesterday with the commit even before refactoring the model and it shows that weird behaviour too when test it with --repo-path. Waiting for @GemmaTuron to test it with bigger number of smiles.

Thanks.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Thursday, July 13, 2023

Model eos46ev: Re-testing it on COLAB and on DockerHub, checking it if it reproduces the same behaviour that it shows on CLI or not. Model eos5179: Debugging it log files and temporary ersilia files to check the cause of error. Once resolve that error and make it work, after that start refactoring the model. Model eos2fy6: Tested it on COLAB, CLI and DockerHub

Thanks.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya !

I'll try to help with eos1579 this afternoon, and we'll continue on eos46ev as well, don't worry we'll figure it out, I see you also have eos4tcc assigned, in case you are too stuck with the above, try that one while we try to help you!

ZakiaYahya commented 1 year ago

Hello @GemmaTuron Okay sure, i'm still working on eos1579 along with @DhanshreeA to figure it out. Debugging eos46ev as well. I've opened PR on eos4tcc 2 days ago, kindly check it here https://github.com/ersilia-os/eos4tcc/pulls. Thanks.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron @DhanshreeA

Tasks List: Friday, July 14, 2023

Model eos4tcc: I've refactored the model and it is working fine both with run.sh and witrh --repo-path, I've open PR on it but i think somehow you missed it @GemmaTuron, Kindly check it.

Model eos46ev: So we have encountered two problems in this model i.e. (1) Skipped smiles in the ouput means output entries is not equal to input entires (2) Null inputs at prediction time. Incorporated the Miquel suggestion on changing the way to read smiles from input file and add code snippet to handle NAN values replaced with zero so it won't give null in the output. The code is working fine with the changes. I've open PR on it, Kindly check it.

Model eos1579: Couldn't able to resolve it yet, discuss and debugged it with @DhanshreeA and it seems like their is nothing wrong with input/ouput processing, something wrong with service.py and the request returns from bentoML, i'm digging into the detail of it. Suggestions are required.

Meanwhile @GemmaTuron, you can assign me new model as well to work on it along with eos1579. Thanks.