Arcadia-Science / peptigate

Peptigate ("peptide" + "investigate") predicts bioactive peptides from transcriptome assemblies or sets of proteins.
MIT License
0 stars 0 forks source link

considering moving blast built databases from outputs to inputs dir #25

Open taylorreiter opened 6 months ago

taylorreiter commented 6 months ago

even though this is technically and output, leaving these in the outputs dir mean they need to be rebuilt for each run of peptigate. I usually use the same inputs dir between config files but unique outputs dirs. So moving to the input dir would allow the build to be shared between different runs of peptigate. This is trivial for peptigate, but actually would be a time saver for uniref50, which I'm suggesting we integrate because of #24.

keithchev commented 5 months ago

I think the solution in #24 is fine but just as an aside/thought, I don't think it's necessary to limit the pipeline to only two directories. A third directory could be introduced for these intermediate dependencies like blast dbs that are often shared between input directories. It could be called something like blast_databases/ (if that's all it's needed for) or else something generic like derived_input/ or intermediate_output/.

This would allow input/ to be strictly limited to user-provided or hard-coded input datasets and output/ to be limited to the final, input-dependent outputs. Which, IMO, would be clearer.

taylorreiter commented 5 months ago

🤯 that is a great solution that I never thought of. Mind actually blown. I'll leave the change in #24 in place for now but leave this open, and I'll make a small PR in the future for these intermediate files. Thank you!