Open giovannipizzi opened 1 month ago
Here are some steps to create a minimal running PSQL in user space, confined in a folder. The idea is that probably we could consider this, at least for the folder-based approach?
The idea of this message here is just to show that it is actually possible to have a folder-based approach even with PSQL, with some caveats.
Here are some steps to create and use a new PSQL DB locally, as a standard users, without ports but just Unix sockets.
As a folder I use pwd
for simplicity. it shoudl of course be in a place like ./.aiida/psql_db/
--encoding= --locale= --pwfile= --username=postgres
pg_ctl init -D `pwd`
I create in it a sockets dir inside the same folder, and make sure that only sockets are used, and that sockets go in the folder just created
mkdir sockets_dir
echo "listen_addresses = ''" >> postgresql.conf
echo "unix_socket_directories = '`pwd`/sockets_dir'" >> postgresql.conf
I can start, check the status, and stop the PSQL server with these commands.
pg_ctl start -l logfile -D `pwd`
pg_ctl -D `pwd` status
pg_ctl -D `pwd` stop
Further notes:
sockets_dir/.s.PGSQL.5432
psql -h /Users/pizzi/tmp/test-local-psql/sockets_dir template1
(you can e.g. check DBs, create a new one, try out things etc.)
verdi storage startdb
, that does nothing for e.g. a sqlite profile, but runs this command for a PSQL)pinging @mbercx since we discussed this today, @sphuber since we discussed this in the past, and also others like @unkcpz @khsrali @GeigerJ2 @agoscinski
Thanks, @giovannipizzi, for the detailed write-up! Some preliminary notes:
--local
flag to the verdi profile setup
command, at least for the psql_dos
storage (as with SQLite it's already localized), which takes care of the necessary steps you outlined in the background (similar to how verdi quicksetup
sets up PSQL)? I'm not familiar with pg_ctl
so not sure how feasible/easy this would be..aiida
folder, similar to git
s discovery mechanism (both in the verdi init
PR discussion). I still think this would be a nice feature, if we can make it work such that AIIDA_PATH
still takes precedence if it is set. To make verdi commands use the correct folder should be doable with the discovery in place, by checking the Path.cwd()
, with making "sure this is not incompatible with the way AiiDA is currently used" requiring some thought.verdi sync
command, I looked a bit into this. Mirroring processes to disk is straightforward with the new verdi process dump
command. We should probably check and do it only for finished, sealed processes as verdi process dump
currently doesn't have an option for incremental dumping (I'm actually not sure how the command behaves for running processes... I'd guess it just dumps the files that are there, and running again to update would require --overwrite
). For other entities, we should define a schema for the resulting directory structure (I remember the idea of allowing users to specify this schema, e.g., via a YAML file). That is, how are groups handled, other entities that might be of interest (such as StructureData
-> dump those to disk in a structures
directory?), and further logic that determines the output directory structure. I guess these things will become clearer, as I'm working on this feature and sync
ing some profile data that contains certain elements of organization, e.g. groups.Thanks for the comments! Just a follow up comment ony own comments. What I wrote was just some thoughts and ideas. I'm happy to discuss if, for psql, it's really safe to put all in a folder. Maybe it creates more problems if people start to move the folder while the DB is running etc. To be discussed
Thanks @giovannipizzi! Just for context, I'm putting the verdi init
PR here, which implemented a git-like folder discovery for the .aiida
folder:
https://github.com/aiidateam/aiida-core/pull/6315
As well as my rather extensive objections to this approach.
Just writing down my thoughts quickly, on the train and only have 5 mins. ^^
.aiida
-folder discovery over profile-via-folder discovery?One way I envision this to work is to give the user the option (perhaps literally via an option, but perhaps also as a different storage backend) to create a "localized" or "contained" profile in a folder. This would write a specific file to the top level of that folder (e.g. .aiida_profile
) that we could then use to implement git-like directory-based profile discovery. I.e. the precedence would be:
a. Profile specified in command via -p
option.
b. Folder-based discovery of the .aiida_profile
file.
c. Default configured profile in the .aiida
directory.
Hi,
I have no objection of using a different file/folder name for this (instead of .aiida
) - but I'd need to rediscuss why this is a problem (ore read again your objections if it's explained there).
But for 1, I think it's just more intuitive for people who just want to work in parallel with multiple profiles. I think that at the moment, most people just use 1 profile because switching is not trivial (it takes a lot of time to setup correctly, and you need to use different terminals for each). With folder-based, we mirror git
: no need to open a new terminal, just change folder (even a subfolder) and everything will apply to the new repository. Very simple, intuitive, and people are used to this with git and other tools, so shouldn't be a surprising behaviour. And no need for complex setup.
It comes of course with implications on supporting working on various profiles even if created with different AiIDA versions without automated profile file changes etc. But I think it's OK, we can probably even commit to not making any migration within major versions, as well as having backward compatible profile files within major versions (and anyway avoiding automatic migration of those, but wanting users that they are using a profile of a old - or too new - AiiDA version, with suggestions on how to proceed).
Motivation
Even advanced users might not fully understand where data is stored (DB, repository, configuration file, ...). This is currently also virtual-environment based. If I want to know all profiles in my computer, it's hard to keep track of them if I'm note very organised. A user might just want to know that everything about a profile is inside a folder (as it happens for a git repo, everything is inside a .git folder), and if I move the folder, I'm moving everything; if I delete a folder, everything is gone; etc.
Desired Outcome
It is possible to have a way to define a profile that is fully confined in a folder (including all data). This should be easy for the SQLite DB at least (notes in a comment below for PSQL). It should then be easy to just use that profile by just navigating into the folder. In the future, similar to a
git checkout
, one could have a way to mirror part of the data in the folder, at least in read-only mode, so people can just use usual file browsers,grep
commands etc, to understand what the folder is about. Syncing to this folder does not have to be realtime, but can happen when a command is invoked, similar to git pull, e.g.verdi sync ...
. This could e.g. be in the form of an extended version of theverdi process dump
command, that instead dumps all nodes inside a given group. And, when averdi group sync --all
command is run, it refreshes the dump files to ensure they are up to date with the AiiDA DB.Impact
I think many users have a hard time understanding the concept of profiles, where data is stored, how to delete a profile and back it up (even if we provide commands), what to do when disk storage is running low, etc. Folder based can help a lot in starting with AiiDA while feeling to have a full control of their data.
Complexity
Progress
Being discussed/brainstorming.