BCCDC-PHL / pipeline-provenance-schema

0 stars 0 forks source link

Incorporate databases into schema #3

Open dfornika opened 7 months ago

dfornika commented 7 months ago

We often use databases (blast, kraken, mash, mob-suite, etc.) as part of our analysis, but we don't have a way to describe them in the current schema.

Proposed schema:

databases:
  - database_name: <database_name>
    database_version: <database_version>
    files:
      - filename: <filename>
        sha256: <sha256_checksum>
jpalmer37 commented 7 months ago

A thought to consider: do we want each database to be specifically associated with a tool instead associated with a process? (Can both can be true? A database exists within a process and is linked to a tool?). Not pushing for this as a change; would just like to hear your thoughts on this setup. @dfornika

dfornika commented 7 months ago

I think just associating a database with a process is a bit more flexible and general. In theory a database could be used by multiple tools within a process, or it could be used by a script but not a tool.

jpalmer37 commented 7 months ago

Sounds good! That does seem more flexible.