FEI38750 / MTD

MTD: a unique pipeline for host and meta-transcriptome joint and integrative analyses of RNA-seq data
Other
16 stars 8 forks source link
bioinformatics host-pathogen-interactions metatranscriptomics microbiome rna-seq single-cell

MTD: Meta-Transcriptome Detector

MTD is a software that has two sub-pipelines to jointly analyze of host transcriptome with its microbiome by using bulk RNA-seq and single-cell RNA-seq data, respectively. It supports comprehensive microbiome species, including viruses, bacteria, protozoa, fungi, plasmids, and vectors. MTD is executed in Bash in GNU/Linux system. Users can easily install and run MTD using only one command line and without requiring root privileges. The outputs (graphs, tables, count matrixes, etc.) are automatically generated and stored in the designated directory/folder defined by the user.

Key Points

Single-cell RNA-seq

  1. Put the count matrix of host genes in a folder named with the sample name. In this folder, 10x should be a matrix.mtx, a genes.tsv, and a barcodes.tsv; or a single .h5 file. Dropseq should be a .dge.txt file.   
  2. Type the path of the host matrix folder and the corresponding fastq files into the columns of the samplesheet_SC.csv accordingly. For example:    Then MTD will read the corresponding file paths from this samplesheet_SC.csv for single-cell analysis.
  3. In termial, type\ bash [path/to/MTD]/MTD_singleCell.sh -i [path/to/samplesheet_SC.csv] -o [path/to/Output_folder] -h [Host species taxonomy ID] -t [Threads] -p [Platform] -d [prime Direction] -c [path/to/Cell_barcode_file.whitelist.txt]\ Single cell RNAseq platform(-p): enter 1 for 10x v2 chemistry, 3 for 10x v3 chemistry, or 2 for Dropseq platform\ prime_direction(-d): specifying barcode locations: enter 3 or 5 for barcodes are at the 3’ end or 5' end of the read\ For example:
    bash ~/MTD/MTD_singleCell.sh -i ~/scRNAseq_rawData/samplesheet_SC.csv -o ~/output -h 10090 -t 20 -p 1 -d 3

    Notes

    • 10x and Dropseq use paired end sequence. The first fastq file contains barcodes (e.g., 26/28bp length in SRR4210_R1.fastq). The second fastq file contains transcript's sequences (e.g., 98bp length in SRR4210_R2.fastq).
    • Default QC is subset= nFeature_RNA>200 & nFeature_RNA < 2*median(number_of_Feature_RNA) & percent.mt < 10\ In addition, user can customize QC by adding -l [Minimum nFeature_RNA] -r [Maximum nFeature_RNA] -m [percent.mt]

Outputs

Bulk RNA-seq

image1 The results are generated automatically and saved in the output folder defined by the user.\ The output included:\

Notes

Overview of MTD

\ (A): The workflow for bulk mRNA-seq analysis. (B): The workflow for single-cell mRNA-seq analysis. \ White boxes represent the reads in FASTQ format and the count matrix. Blue boxes show the bioinformatics software used. Green boxes are the additional tools for data processing. The white boxes with curved edges show the reference genome and databases. In the single-cell mRNA-seq workflow (B), the left side exemplifies the host reads process protocols, and the right side in yellow shadow shows the MTD automatic pipeline to calculate the count matrix for the microbiome reads and the correlation test between microbiome and host genes.

Citation

Fei Wu, Yao-Zhong Liu, Binhua Ling. (2022). MTD: a unique pipeline for host and meta-transcriptome joint and integrative analyses of RNA-seq data. Briefings in Bioinformatics, https://doi.org/10.1093/bib/bbac111

Licence

This software is freely available for academic users. Usage for commercial purposes is not allowed. Please refer to the LICENCE page.