ayshwaryas / ddqc

Biology-centered data-driven quality control for scRNA-seq
BSD 3-Clause "New" or "Revised" License
14 stars 2 forks source link

ddqc - Biology-centered data-driven quality control for single cell/nucleus RNA sequencing

Required packages

What data formats ddqc supports?

ddqc can work with all formats supported by Pegasus. This includes h5ad, h5, mtx, csv, loom.
Here is a sample code for reading different data formats:

import pegasusio as io
import ddqc
# read h5
data1 = io.read_input("path/file.h5ad", genome = 'hg19')
# read csv
data2 = io.read_input("path/file.csv", genome = 'hg19')
# read mtx
data3 = io.read_input("path/file.mtx", genome = 'hg19')
# call ddqc
ddqc.ddqc_metrics(data1)

Pegasus can also aggregate multiple files into one object. To do it, first create a CSV file with the information about your data:

Sample,Location
sample1,path/file1.mtx
sample2,path/file2.mtx
sample3,path/file3.mtx

Then use the following Python code:

import pegasusio as io
import ddqc
data = io.aggregate_matrices("pegasusio_test_cases/case6/count_matrix.csv")
# call ddqc
ddqc.ddqc_metrics(data)

Please refer to PegasusIO tutorial for a complete guide on reading files in Pegasus.

What are the outputs of ddqc?

There are four plots provided for exploratory data analysis:

If you requested to return df_qc the function will return a pandas dataframe containing the following info for each cell:

The pegasus object will have the following data added to its obs field: