SainsburyWellcomeCentre / aeon_experiments

Experiment workflows for Project Aeon
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Tests/Checks for quality control #29

Closed jerlich closed 4 months ago

jerlich commented 3 years ago

At the July 28th meeting, we discussed QC. A brief summary (of my memory) of the issues:

jkbhagatio commented 3 years ago

I think there are two types of QC: one that operates online on the raw data, and the second that can operate offline on the data that is already stored on ceph.

I moved this issue to this acquisition repo because the more I think about this the more it makes sense have the bulk of the QC be the former, run online on the acquisition machine.

I think both types of QC can be written in any language, but output a common format, which can be the DataFrame/table suggested - maybe makes sense to write everything out in CSVs that are saved to /ceph/aeon/aeon/preprocessed/qc/online and /ceph/aeon/aeon/preprocessed/qc/offline ?

Both types could be run as system jobs. I think it makes sense to have all the QC code live in this repo?

ttngu207 commented 3 years ago

I agree that there are generally two types of QC, online and offline as Goncalo suggested. For the offline part, based on our July 28th meeting, I've prototyped a set of DJ tables to specify the flow of this QC process. image

Roughly, there will be multiple QC routines, checking for different aspects of the data. Each QC routine will have a corresponding function - the code live either in aeon_acquisition or aeon_mecha. If in aeon_acquisition, then the aeon_mecha repo must have aeon_acquisition repo as a dependency - i.e. must pip install the aeon_acquisition for DJ pipeline to run the QC part)

But all QC functions should returns i) a qc status code, ii) a comment/description (optional) and iii) a list of bad periods. (here, I'm thinking some offline QC routines could be loading the online QC results from the CSV file, to also store the online QC results)

In the diagram above, the table SessionQC is a computed table that, for a given session, executes all QC routines, calling the QC functions and storing the status codes and bad periods.

Downstream, the table GoodSession is another computed table that aggregates the results from all QC routines for a particular session, and determines if it is a Good or Bad session. (here, if we know ahead of time a session is bad, then the experimenter can just manually flag that session by inserting into the BadSession table.

Only GoodSession will proceed to any further analyses, and users can restrict to the GoodSession table for analyses of their own.

Dario55 commented 3 years ago

QUALITY CHECK.... CHECK LIST I have listed below some failure mode we have experienced or we are very likely to experience and we should include in the QC list. Please add more. For some of them I have suggested some possible solutions

Session is bad, remove it:

Some periods in the session are bad, identify and remove(?)/flag them:

Additional QCs

jkbhagatio commented 3 years ago

@glopesdev I "assigned" you to some of these issues in Dario's checklist, let me know if this is ok?

glopesdev commented 2 years ago

I think most of these have now been broken up into separate issues and resolved for Experiment 0.2. They are integral part of the online alert system and they are logged with standardized codes by the system.

The offline ingestion checks should probably be moved back into the aeon_mecha repo. It's possible that these offline checks have also been already broken down and replicated there, in which case we should probably close this issue altogether.

@jkbhagatio , @Dario55 , @ttngu207 any thoughts on this?

glopesdev commented 4 months ago

Closing as most of the scope of this issue has now been invalidated by either the alert system or offline QC routines. It will stay in the repo for reference.