geneontology / pipeline

Declarative pipeline for the Gene Ontology.
https://build.geneontology.org/job/geneontology/job/pipeline/
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Produce package of "pathway-like" GO-CAM TTL files #376

Closed dustine32 closed 6 months ago

dustine32 commented 6 months ago

For each GO release, run SPARQL queries against blazegraph-production.jnl using blazegraph-runner to find "pathway-like" GO-CAM models and save their TTL files as requested by @thomaspd.

The "pathway-like" criteria differs a bit from the usual GO-CAM website "get all causal models" query:

  1. All models satisfying the standard "2+ consecutive causal edges connecting 3+ MFs"
  2. All models with 2+ MFs connected with a shared small molecule instance via has_output and has_input edges.
  3. Remove models where the number of root MF individuals is greater than non-root MF individuals.
  4. Remove models having any BP or CC individual connected to multiple MF individuals.
  5. Remove models having any causal edge where subject or object is not an MF.

I have a start on these queries here (can transfer repo to GO org or individual queries to appropriate GO repo).

kltm commented 6 months ago

From @dustine32, prototype at : https://go-public.s3.amazonaws.com/pathway-like_go-cams_2024-05-07.tar.gz

dustine32 commented 6 months ago

@kltm I have a draft repo with a Makefile (and README!) ready to go at https://github.com/geneontology/sparql-for-pathway-go-cams. Let me know if you want me to change this at all to fit in the GO pipeline better.

kltm commented 6 months ago

@dustine32 Cheers! If there is no deus ex machina that saves the qc side of things, I'll attempt to switch those (after testing locally).

kltm commented 6 months ago

Context:

sjcarbon@moiraine:~/local/src/git/sparql-for-pathway-go-cams[main]$:) ls -AlFrt
total 39247844
drwxr-xr-x 2 sjcarbon sjcarbon        4096 May  9 17:56 sparql/
-rw-r--r-- 1 sjcarbon sjcarbon         845 May  9 17:56 README.md
-rw-r--r-- 1 sjcarbon sjcarbon           7 May  9 17:56 .gitignore
drwxr-xr-x 2 sjcarbon sjcarbon        4096 May 10 16:53 scripts/
-rw-r--r-- 1 sjcarbon sjcarbon        2627 May 10 16:53 Makefile
drwxr-xr-x 8 sjcarbon sjcarbon        4096 May 10 16:53 .git/
-rw-r--r-- 1 sjcarbon sjcarbon 40189755392 May 10 17:10 blazegraph-production.jnl

Running: sjcarbon@moiraine:~/local/src/git/sparql-for-pathway-go-cams[main]$:) NOCTUA_MODELS_PATH=/home/sjcarbon/local/src/git/noctua-models make target/pathway-like_go-cams.tar.gz Final product:

target/pathway-like_go-cams.tar.gz
kltm commented 6 months ago

@dustine32 Noting test inclusion in second stage snapshot: http://skyhook.berkeleybop.org/snapshot/products/ttl/pathway-like_go-cams.tar.gz This look right to you?

kltm commented 6 months ago

Noting: ~17m of runtime.

dustine32 commented 6 months ago

This look right to you?

@kltm Yup! (mostly) A minor thing is it untars retaining the target/pathway_like_go_cams/ directory structure instead of just pathway_like_go_cams/. I’ll see if this happens on my local end but I’m fine with it if you are!

kltm commented 6 months ago

@dustine32 Hm, I don't think I'm doing anything different than the make above (https://github.com/geneontology/pipeline/issues/376#issuecomment-2105407963). Does this happen when you're running locally?

dustine32 commented 6 months ago

@kltm Dang, it is on my end (in the Makefile)! I'll fix this quick.

dustine32 commented 6 months ago

Commit https://github.com/geneontology/sparql-for-pathway-go-cams/commit/83f3e3cc7583295adca20c54e6a739c54379602c should fix the target/ issue. Sorry, I guess I never tested extracting the tar.gz product.

kltm commented 6 months ago

No worries--I'll give it another run now.

kltm commented 6 months ago

@dustine32 Run through. Howzit look now?

dustine32 commented 6 months ago

Cool! It untars as intended now:

$ tar -zxvf pathway-like_go-cams.tar.gz
x pathway_like_go_cams/
x pathway_like_go_cams/641ce4dc00000214.ttl
x pathway_like_go_cams/65bc474400000788.ttl
x pathway_like_go_cams/63f809ec00000347.ttl
x ...

Thanks @kltm!

kltm commented 6 months ago

@pgaudet This will go out with the snapshots starting next week. If you can think of a better place than /products/ttl/pathway-like_go-cams.tar.gz let me know and we can move it.