LSSTDESC / csd3_uk_tasks

A repo to keep track of any activities relating to setting up and running LSST DESC jobs at CSD3
2 stars 0 forks source link

Set up a database at CSD3 #2

Open nsevilla opened 3 weeks ago

nsevilla commented 3 weeks ago

According to George Beckett:

CSD3 recommend we set up a PostgreSQL database in a service call ARCAS: this is a cloud platform that sits next to CSD3 (I think it is in the same racks). The only limitation they note is that ARCAS VMs cannot mount the (Lustre) work file system of CSD3. I'm assuming this isn't an issue for our case? Another option, which we have used when running on CSD3, is to use an SQLite database, though this may/ may not be capable enough for what you have planned. 10:17 Assuming the ARCAS platform sounds suitable, could you let me know roughly what specs we would need for the database (number of cores, memory, and disk space).

nsevilla commented 2 weeks ago

After some discussion, we think that the PostgresSQL database to be set up at the ARCAS cloud system does not need to see the data directly.

Jim: I’m not sure that the db itself needs to be able to access the files on disk. The LSST software that queries the db does certainly. George: Yes, I wasn't sure if PostgreSQL had an API that let it source table data from a remote file system so, perhaps, the key is that the Pipeline (running on CSD3 compute nodes) can see both the PostgreSQL database and the Lustre file system and mediates between the two in terms of writing metadata to the Butler registry or reading metadata from the Butler registry? Heather: If Jim is correct (and I believe he is) - then the LSST Sci Pipelines code would handle that mediation. The Postgres db is just a registry that provides the necessary information for the LSST Sci Pipelines to find the data in the "data store".

heather999 commented 1 week ago

Jim reminded us we should try to use Postgresql 16 to match the version Rubin is using