There seem to be quite a few race conditions if one runs lifton in parallel. A project I'm working on requires running lifton from several dozen source annotations to several hundred references, and so I use snakemake to parallelise runs across a cluster. However (at least) the following race conditions appear:
If the output files are something like output/$SOURCE/$TARGET_NAME.gff, there's a race condition as lifton writes to output/$SOURCE/lifton_output regardless of which genome is being annotated, which corrupts the intermediate files.
It seems like at certain stages the gffutils sqlite database is written to, even if it already exists before creating (e.g. with ANALYSE). This causes race conditions and crashes as only one process can write to a sqlite db at once (normally).
With liftoff, one could work around these same issues because liftoff accepted a temp/intermediate directory name (so you could use e.g. output/$SOURCE/$TARGET_NAME/ instead of output/$SOURCE/lifton_output, making each job's directory unique). Liftoff also did not modify the gff database if it already existed, so if you pre-computed all needed gff_dbs before running any liftoff, then you were guaranteed not to have race conditions on the sqlite db.
I'd encourage you to adopt these workarounds in lifton.
Hello all,
There seem to be quite a few race conditions if one runs lifton in parallel. A project I'm working on requires running lifton from several dozen source annotations to several hundred references, and so I use snakemake to parallelise runs across a cluster. However (at least) the following race conditions appear:
output/$SOURCE/$TARGET_NAME.gff
, there's a race condition as lifton writes tooutput/$SOURCE/lifton_output
regardless of which genome is being annotated, which corrupts the intermediate files.ANALYSE
). This causes race conditions and crashes as only one process can write to a sqlite db at once (normally).With liftoff, one could work around these same issues because liftoff accepted a temp/intermediate directory name (so you could use e.g.
output/$SOURCE/$TARGET_NAME/
instead ofoutput/$SOURCE/lifton_output
, making each job's directory unique). Liftoff also did not modify the gff database if it already existed, so if you pre-computed all neededgff_db
s before running any liftoff, then you were guaranteed not to have race conditions on the sqlite db.I'd encourage you to adopt these workarounds in lifton.
best, Kevin