PennLINC / DRIP

Data Release Integrity Pipeline
https://pennlinc.github.io
MIT License
0 stars 0 forks source link

Input Data #2

Open mattcieslak opened 7 months ago

mattcieslak commented 7 months ago

Part 1: Assumptions

  1. We should assume that all the data in BIDS has gone through CuBIDS. This has a couple important implications on how we think about the input data a. The input BIDS data is the definitive source of the dataset. If a file is included in BIDS it means we intend to use it - even if it is a VARIANT. b. VARIANTS that should not be used for analysis should not be present in the BIDS data.
  2. Suppose we want to know if a session has a task-rest scan. a. We would say yes even if the task-rest scan is a VARIANT.

Part 2: do we want to wade into the territory of deciding what "completeness" means?

  1. I think we don't want to require a schema file here, where we manually define what tasks must be present in each session in order for a subject/session to be complete. Or do we?
  2. If we define completeness manually for input data, the completeness value can propagate down into derivatives easily. If an input session is "complete" then we only need to check that all the expected derivatives files are there for all the input files.
  3. In processing data we split up either by subject or session. We will need to take this into account somehow

Part 3: How do we represent "a unit of input" in code? We have _bold, _dwi and _asl inputs. In dwi specifically, they will almost certainly (although not guaranteed) be concatenated. There are also numerous auxiliary files that can go with the input images (_sbref, _part-, etc) that we don't expect outputs to be created for.

Ideally, the "unit of input" is all we'll need to figure out what the expected outputs are in the derivatives.

mattcieslak commented 7 months ago

the "requirements" file for completeness is really only useful for bookkeeping. This could be optional, and doesn't really impact the evaluation of whether processing was incomplete.

mattcieslak commented 7 months ago

for part 3, we can use bids filters to find files that will produce outputs. Then we can use these files to predict outputs

kahinimehta commented 7 months ago

also thinking about accounting for runs in completeness - how would we treat run 1 versus run 2 cases?