Open taylorreiter opened 6 months ago
I think the solution in #24 is fine but just as an aside/thought, I don't think it's necessary to limit the pipeline to only two directories. A third directory could be introduced for these intermediate dependencies like blast dbs that are often shared between input directories. It could be called something like blast_databases/
(if that's all it's needed for) or else something generic like derived_input/
or intermediate_output/
.
This would allow input/
to be strictly limited to user-provided or hard-coded input datasets and output/
to be limited to the final, input-dependent outputs. Which, IMO, would be clearer.
🤯 that is a great solution that I never thought of. Mind actually blown. I'll leave the change in #24 in place for now but leave this open, and I'll make a small PR in the future for these intermediate files. Thank you!
even though this is technically and output, leaving these in the outputs dir mean they need to be rebuilt for each run of peptigate. I usually use the same inputs dir between config files but unique outputs dirs. So moving to the input dir would allow the build to be shared between different runs of peptigate. This is trivial for peptigate, but actually would be a time saver for uniref50, which I'm suggesting we integrate because of #24.