Closed jaamarks closed 8 months ago
Should create a unit test for this.
Should create a unit test for this.
For now, let's table the creation of the unit test for this.
In the data module cgr_gwas_qc/testing/data.py
, the docstring explains that the TestData
class holds a collection of very small test datasets sourced from the internet. These datasets are designed for testing specific functionalities such as file type conversion and upstream workflow components. It's important to note that these datasets are synthetic, so they will not work with many of the filtering steps and aren't compatible with various workflow parts.
Thus the qc_exclusion.py
module isn't applied to synthetic data. To address this, creating a meaningful test suite would require identifying an entirely different and appropriate test dataset. Additionally, manual modification of some subjects to introduce non-accepted Case/Control statuses would be necessary. Subsequently, a comprehensive test suite would need to be authored, and the data submodule updated to incorporate this new test data.
Though not intractable, this seems wholly unpractical for creating a unit-test for this particular edge-case functionally at the moment.
To validate the new functionality, we conducted the following testing:
Test Run Details: Executed the workflow on a test dataset which would run completely and allow us to test the new functionality.
Manual Modifications:
After completing the workflow, then manually altered the files that were created: sample_level/sample_qc.csv
and subject_level/subject_qc.csv.
Focus Area:
Specifically adjusted entries in the case_control
column to ensure their accurate representation in the final report tables (in delivery/QC_Report.docx
).
Illustration of Data Changes: Here's a snapshot of the original and modified data:
This approach was taken to thoroughly test the functionality under various conditions – samples being labeled something other than [Case
, Control
, QC
, Unknown
].
Convert all samples' case/control status to "Unknown" when their label is not in [Case, Control, QC, Unknown].