Closed Hrovatin closed 1 year ago
Hi @Hrovatin ,
this really depends on what your local infrastructure looks like and what for resources you are provided with.
In my case I was able to get a VM with a permanent DNS-entry provisioned by our IT-Admin and I have the MongoDB running on there in a docker container. So I effectively have to self-manage the database and the OS. The VM has the following specs: 4 cores, 32 GB memory, HDD storage backend. And in my experience seml and by extension sacred aren't particularly resource intensive for the database. I rarely go over 50% CPU utilization and the DB uses around 14 GB of Memory. This holds true, even if I have a few hundred experiments starting in parallel. The HDD backend isn't amazing for data retrieval, but it works okay. In any case, I don't really store large amounts of data in the DB (such as artifacts), since our compute clusters have network filesystems, which are better suited for the task and then only the paths to the artifacts need to be stored in the DB.
Back to getting the DB up and running: What you need is:
As for how to allocate these resources the following approaches come to mind in order of complexity:
Thanks for your details Hendrik and sorry for the late reply @Hrovatin. As the cluster setups vary vastly between each research group there is no unified way of setting up a MongoDB (VMs, Docker, bare metal, ...). We generally recommend that you talk to your system administrator about the best setup in your particular setting.
As we cannot give a definite answer to installation instructions, I will close this issue as out of scope. For a general guide, I'd recommend the official MongoDB documentation https://www.mongodb.com/docs/manual/installation/ .
I have solved the problem by getting a free account on https://cloud.mongodb.com/
Since I did not know how to set up configure properly when using the https://cloud.mongodb.com/ I hard coded the connection to my account on https://cloud.mongodb.com/ by changing get_mongo_client
in seml/seml/database.py
.
I replaced the line:
client = pymongo.MongoClient(host, int(port), username=username, password=password, authSource=db_name, **kwargs)
With:
pymongo.MongoClient("mongodb+srv://USERNAME:PWD@seml.mqiebcc.mongodb.net/?retryWrites=true&w=majority&ssl=true", server_api=ServerApi('1'), connect=False)
EDIT:
This also needs import from pymongo.server_api import ServerApi
I was wondering if you could suggest some useful resources for setting up mongodb on compute cluster or provide a guide for it. Is that even possible to do it as an user of a cluster due to resource-requirements of mongodb?