Closed pbilling closed 2 years ago
Troubleshooting plan
From the query, I know there is a FastqToUbam job node for these Ubams and that it succeeded, so I can expect that there is a Ubam object in cloud storage and I just need to register it in the database.
Because the (:FastqToUbam) job has a "output_UBAM" property with the cloud storage path to the object I'm going to try the following steps:
1. & 2. Get cloud storage path of missing Ubams
// Cypher query
MATCH (job:Job:FastqToUbam)-[:STATUS]->(d:Dstat)
WHERE NOT (job)-[:GENERATED]->(:Ubam)
AND (d.status = "SUCCESS" OR d.status = "RUNNING")
RETURN job.output_UBAM
Update Ubam object metadata (n=5)
Instead of trying to add all 20,000 Ubams to the database at once, I am going to try with a small subset of n=5 and see what happens.
Bash script to update object metadata:
#/bin/bash
# Update object metadata in Google Cloud Storage
# Input is a text file with one Google Storage path per line (e.g. "gs://bucket/path"
while IFS= read -r line; do
gsutil setmeta -h 'x-goog-meta-trellis-ver:1.2.3' "$line"
done < "$1"
Console commands:
head -n5 missing-ubams.csv > missing-ubams-n5.csv
./update-object-metadata.sh missing-ubams-n5.csv
After running the script, I noticed that the MERGE queries used to add the nodes to the database are timing out (limit = 90 seconds). Right now nodes are merged on a composite index :Blob(bucket, path) which I think are known to be less performant than single-value indexes. If I run the same query but use the :Ubam(uri) index, the same command runs in 1.3 seconds.
I'm going to deploy a hotfix that updates the query to use the :Ubam(uri) instead.
From the Logs Explorer for the db-query function I can see that the MERGE queries seem to have run correctly. I can also verify this by query for the Ubams in the database.
I'm also going to drop the index on :Blob(path, bucket) and replace it with :Blob(uri)
// Cypher query
DROP INDEX ON :Blob(path, bucket)
// Cypher query
CREATE INDEX ON :Blob(uri)
The problem
We can see three different cases here. For "FAILURE", we don't expect an output. For "SUCCESS" we do expect an output and for "RUNNING" we maybe expect an output. Those jobs are not running but the end result was not recorded in the database so we don't know whether they succeeded or failed.
Right now, I am going to focus on finding and adding Ubams to the database for successful jobs since those are the majority of cases.