Provides a Clojure library for use by the Wormbase project.
Features include:
Model-driven import of ACeDB data into a Datomic database.
Conversion of ACeDB database dump files into a datomic database
Routines for parsing and dumping ACeDB "dump files".
Utility functions and macros for querying WormBase data.
A command line interface for utilities described above
(via the clj -A:datomic-pro -m pseudoace.cli
command)
xml2
utility.
Ubuntu (20.04) install: sudo apt-get install xml2
Branching & code review
master
branch.master
.This project attempts to adhere to the Clojure coding-style conventions.
Run all tests regularly, but in particular:
#Test using datomic pro and default clojure version
make run-all-tests
pseudoace
is both a command line application and a library, which are consumed by other WormBase applications.
The library is deployed as a thin jar to Clojars (a public repository for packaged clojure libraries), while the standalone command line application is deployed as an uber jar through Github package releases.
When creating and deploying a new release, you'll need to
If problems pop up during the standalone-application deployment, they still be correct while this is much harder in Clojars.
Before being able to start any pseudoace deployment, you'll need to do some (manual) release preparations.
master
branch if not already done so.CHANGELOG.md
file
pom.xml
file:
deps.edn
file has changed, update the pom.xml
dependencies:
clj -Spom
# Update pom.xml to
# * pseudoace release to be created in the <version> tag value
# * have "wormbase" (unquoted) as <groupId> tag value
# * have "pseudoace" (unquoted) as <artifactId> tag value
$EDITOR pom.xml
<version>
tag to match the version-nr to be created.-a
).The following command will create a release archive based on the latest git tag (created above).
make uberjar
An archive named pseudoace-${GIT_RELEASE_TAG}.tar.xz
will be created
in the ./release-archives
directory.
List the content of the created archive
tar -tf release-archives/pseudoace-${GIT_RELEASE_TAG}.tar.xz
The archive should contain two artefacts:
pseudoace-${GIT_RELEASE_TAG}/
pseudoace-${GIT_RELEASE_TAG}/pseudoace-${GIT_RELEASE_TAG}.jar
pseudoace-${GIT_RELEASE_TAG}/sort-edn-log.sh
IMPORTANT NOTE!
As we use a proprietary Datomic license in some code, we need to ensure we comply with the license.
Datomic free can be freely distributed, but datomic-pro cannot. Uber jars containing datomic-pro
assest can never be distributed to a public server for download, as this would violate
the terms of any proprietary Congnitect Datomic license.
As the tar file created above will be deployed publically, ensure this tar file, and specifically the uber-jar file contained therein, does not contain any datomic-pro assets!
tar -xOf ./release-archives/pseudoace-$GIT_RELEASE_TAG.tar.xz pseudoace-$GIT_RELEASE_TAG/pseudoace-$GIT_RELEASE_TAG.jar | jar -tv | grep -P "datomic-(free|pro)"
This command should only return datomic-free
artifacts and no datomic-pro
ones!
Once confirmed, create a new release on github
master
binaries
) so the migration pipeline can use it.In order to deploy to Clojars, the ~/.m2/settings.xml
needs to define
clojars deploy credentials which allow write access to the Clojars wormbase group.
For instructions on how to define this file, see credentials setup.
Run:
make deploy-clojars
Deployment to Clojars require the file ~/.m2/settings.xml
to be defined as described here (settings.xml
part).
Clojars username can be obtained by registering at clojars.org
and a deploy-token can be generated after that by visiting this page.
Ensure you have been added to the wormbase group to allow uploading a new version (ask a colleague).
You can execute ./scripts/dev-setup.sh
to generate a credentials file as describe above
and provide your user name and deploy token.
For any development usage of the code, ensure your working directory is set to the repository root.
A command line utility has been developed for ease of usage.
--url
is a required option for most sub-commands, it should be of the form of:datomic:<storage-backend-alias>://<hostname>:<port>/<db-name>
URL_OF_TRANSACTOR="datomic:dev://localhost:4334/*"
alias run-pace "clj -A:datomic-pro -m pseudoace.cli"
run-pace --url "${URL_OF_TRANSACTOR}" <command>
Alternatively, for extra speed and flexibility, one can call the Clojure routines directly in a REPL session:
# start the REPL (Read Eval Print Loop)
clj -A:datomic-pro
Example of invoking a sub-command:
(require '[environ.core :refer [env]])
(list-databases {:url (env :url-of-transactor)})
Run the pseudoace
jar with the same arguments as you would when using clj
:
java -cp pseudoace-$GIT_RELEASE_TAG.jar clojure.main -m pseudoace.cli -v
Create the database and parse .ace dump-files into EDN.
Example:
java -cp pseudoace-$GIT_RELEASE_TAG.jar clojure.main \
-m pseudoace.cli \
--url $DATOMIC_URL \
--acedump-dir ACEDUMP_DIR \
--log-dir LOG_DIR \
-v prepare-import
The prepare-import
sub-command:
--url
.ace
dump-files located in --acedump-dir
into pseudo
EDN files located in --log-dir
.--model
.--schema-filename
.The format of the generated files' content is:
<ace-db-style_timestamp> <Transactable EDN forms>
The EDN data is required to be sorted by timestamp in order to preserve the initial design decision to using Datomic's internal transaction timestamp to model curation event times.
To sort the EDN log files:
find $LOG_DIR \
-type f \
-name "*.edn.gz" \
-exec ./sort-edn-log.sh {} +
Transact the EDN sorted by timestamp in --log-dir
to the database
specified with --url
:
java -cp pseudoace-$GIT_RELEASE_TAG.jar clojure.main \
-m pseudoace.cli \
--url URL \
--log-dir LOG_DIR \
-v import-logs
Using a full dump of a recent ACeDB release of WormBase, you can expect the full import process to take in the region of 48 hours, dependent on the platform you run it on.