PennLINC / DRIP

Data Release Integrity Pipeline
https://pennlinc.github.io
MIT License
0 stars 0 forks source link

Derivatives Data #3

Open mattcieslak opened 7 months ago

mattcieslak commented 7 months ago

There are two ways we'll want to interact with derivatives data

  1. Is there a file present with a name we expect (we don't need the content)
  2. Does the file contain data we need to extract (eg QC scores)

I think for 1 we should be able to, for each pipeline, write a function that takes an "input unit" and produces a list of expected output files. We can just check if they are present. Also, if we want to

For 2, these files may be produced per "input unit" or there may be one per subject or session. We'll probably need a pre-configured set of qc scores that we know to look for so that we can use then for thresholding.

If an "input unit" does not meet a qc threshold we want to delete all of its expected output files from the release

mattcieslak commented 7 months ago

bids filters on the inputs and derivatives will be absolutely key here. We probably don't need to make a list of expected hard-coded outputs but we should keep track of modality-specific nuances.

mattcieslak commented 7 months ago

will we need to know the settings for the run of the preps that produce the outputs?

kahinimehta commented 7 months ago

I think we probably will need to know what flags were used // + what output they produce. Not sure how we'd get around that?

mattcieslak commented 7 months ago

We came up with an idea that we could use BIDS filters to discover derivatives related to input files. That way we won't have to hardcode the list of outputs