seaborn>=0.11
git clone https://github.com/ayshwaryas/ddqc.git
pip install .
ddqc takes pegasus MultimodalData object as an input. The call of ddqc.ddqc_metrics ideally should be the next step after reading the data, similar to when regular QC is done in scRNA-seq pipelines. ddqc cant work with the normalized matrix, so it should be performed before normalization step.
ddqc can work with all formats supported by Pegasus. This includes h5ad, h5, mtx, csv, loom.
Here is a sample code for reading different data formats:
import pegasusio as io
import ddqc
# read h5
data1 = io.read_input("path/file.h5ad", genome = 'hg19')
# read csv
data2 = io.read_input("path/file.csv", genome = 'hg19')
# read mtx
data3 = io.read_input("path/file.mtx", genome = 'hg19')
# call ddqc
ddqc.ddqc_metrics(data1)
Pegasus can also aggregate multiple files into one object. To do it, first create a CSV file with the information about your data:
Sample,Location
sample1,path/file1.mtx
sample2,path/file2.mtx
sample3,path/file3.mtx
Then use the following Python code:
import pegasusio as io
import ddqc
data = io.aggregate_matrices("pegasusio_test_cases/case6/count_matrix.csv")
# call ddqc
ddqc.ddqc_metrics(data)
Please refer to PegasusIO tutorial for a complete guide on reading files in Pegasus.
There are four plots provided for exploratory data analysis:
If you requested to return df_qc the function will return a pandas dataframe containing the following info for each cell:
The pegasus object will have the following data added to its obs field: