cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
24 stars 1 forks source link

A little file manipulation to avoid clobbering gfa files #42

Closed anshumanmohan closed 1 year ago

anshumanmohan commented 1 year ago

This PR changes the workflow of slow odgi validate a little, as suggested in https://github.com/cucapra/pollen/pull/29#discussion_r1160304303, to avoid the clobbering of hard-won GFA files.

test-slow-validate: fetch
    -turnt --save --env validate_setup test/*.gfa
    for fn in `ls test/*.temp`; do `mv $$fn $${fn%.*}_temp.gfa`; done
    -turnt --save --env validate_oracle test/*_temp.gfa
    turnt -v --env validate_test test/*_temp.gfa
    rm test/*_temp.gfa

Here, -turnt --save --env validate_setup test/*.gfa takes graph.gfa, runs a script that kills off some links, and saves the output as graph.temp while leaving graph.gfa around for the future. The rest is straightforward.

sampsyo commented 1 year ago

Seems reasonable enough! It does make me wish that the temp files didn't have to be named *.gfa, which would simplify things a lot… but I presume that "real" odgi doesn't like it if they're called, like, note5.perturbed instead?

One other tiny convenience thing we might consider: hard-coding a fixed seed for the random perturb.py dropping. That would make things a bit more reproducible, i.e., running make test-slow-validate would produce the same answers every time.

anshumanmohan commented 1 year ago

Yup that's right, real odgi needs either a .gfa or an .og, and passing an otherwise-legitimate file with a changed extension makes it balk.

(Just for completeness: if passed a .gfa, odgi quietly runs the expensive odgi build command to get an .og, and then goes from there.)

anshumanmohan commented 1 year ago

Will fix the seed shortly!