Closed kariso2000 closed 3 years ago
item 2 can be fixed by updating ai4good/models/cm/simulator.py and changing:
with dask.config.set(scheduler='processes', num_workers=n_processes):
to
with dask.config.set(scheduler='single-threaded', num_workers=1):
However need to understand the impact of this.
@pardf @billlyzhaoyh - Do you know who can assist with item 1?
With item 2 that was my fix to get it to work as well but I don't know if that will slow down the compute or not. Item 1 seems like the CSV param files are not copied across correctly or the path utils is not working correctly pointing to the right folders to look for parameters
@kariso2000 For item 1, the correct directory of the file needs to be specified using path_util, as discussed in Slack.
@kariso2000 For item 2, we should be running at least Python 3.7 because this was required for another process. Could you check that is the case for the workers as well please? @billlyzhaoyh The change substantially slowed down the running of CM models.
@kariso2000 @billlyzhaoyh we could try this: https://stackoverflow.com/questions/6974695/python-process-pool-non-daemonic
@kariso2000 For item 2, we should be running at least Python 3.7 because this was required for another process. Could you check that is the case for the workers as well please? @billlyzhaoyh The change substantially slowed down the running of CM models.
@kariso2000 @billlyzhaoyh we could try this: https://stackoverflow.com/questions/6974695/python-process-pool-non-daemonic
Yes the non daemonic code (multithread vs multi process) would work. Let's see the performance of the dask-distributed and we can have this as a future performance enhancement. I believe we also need to chunk our model processing to make best use of DD.
@kariso2000 For item 1, the correct directory of the file needs to be specified using path_util, as discussed in Slack.
We need to remove all reading and writing from the file system. This should be pushed into the database ideally or redis.
I've created two issues so we can pick up at a later date.
@billlyzhaoyh @pardf closing
We will require code changes for us to use distributed servers:
Errors that I have seen during testing:
1 - The worker host need to have a copy of the ai4good code. We can do this by zipping up the code and sending it to workers:
However when this is done I get the following error from gunicorn :
and the following error on the worker:
2 - daemonic processes are not allowed to have children
gunicorn:
worker