Student_Name:

Mariam Oweda ,Mohamed Nofal ID: 191057 ,Mohamed Emam ID: 1910038 ,Nouran Tantawy ID: 181045 ,Hadder Hassan ID: 191051

Comparison of different RNA-Seq data aligners and its effect on differential expression analysis

The great advantage of RNA sequencing is to answer the biological questions that lead to a generation of huge amounts of gene expression data across different biological fields such as biology and medicine. Differential expression analysis of RNA-seq data become one of the frequently used analysis to understand cellular processes in biological and biomedical research moreover to discover diagnostic markers for diseases. However, one of the crucial steps for significant differential expression is precise mapping of the reads to the its transcript. With the advances in the NGS technologies different software packages are developed to overcome the mapping problems like repeats and pseudogenes accompanied with competitive performance and accuracy. From this point, we will test the concordance of RNA sequencing (RNA-seq) analysis output between five mapping software; three alignment-based tools; HISAT2, STAR and the recently developed MAGIC BLAST which does not build an index of a genome and instead it builds an index of a batch of reads and scans a BLAST database for potential matches and two alignment-free tools; KALISTO and SALMON with the most common program for differential gene expression in RNA-seq experiments DESeq2. we will use publicly available RNA-seq dataset of 64 paired end Illumina Hepatocellular Carcinoma samples that correlates with survival. The samples were retrospectively derived from hepatocellular carcinoma tissue as well as non-tumor tissue from the livers of the same patients. We will investigate the differences in aligners performance through comparing DESeq2 list of differentially expressed genes for each aligner and validate the results accuracy based on wet lab published literature. As transcriptomics analysis becomes an important tool of precision medicine, the choice of the bioinformatics software is a very critical step for clinical research.