GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
134 stars 22 forks source link

difference of eventalign files #155

Closed rezarahman12 closed 2 months ago

rezarahman12 commented 2 years ago

Hi there I would like to thank you for developing this beautiful program to analyze nanopore dRNA-seq data. I started from raw data processing before running the xpore, which I tested on the demo data provided.

The size of eventalign.txt and summary.txt is 16.1 GB and 23.4 MB after running nanopolish on HEK293T-WT-rep1 data, which are quite higher than the eventalign.index file of HEK293T-WT-rep1.tar.gz a folder which I had downloaded from https://zenodo.org/record/5103099#.YyPJLLRBw2w

step 1: I downloaded the fast5 files and fastq files of HEK293T-WT-rep1 with the below command- module load axel/2.17.11

axel ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR470/ERR4706156/HEK293T-WT-rep1.tar.gz axel ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR470/ERR4706161/HEK293T-WT-rep1.fastq.gz

When I extract fast5 files using the above-downloaded "HEK293T-WT-rep1.tar.gz" file with the below command. I saw a folder "fast5" which are stored in subfolders/directories named "0", "1", "10", "11",...

tar -zxvf HEK293T-WT-rep1.tar.gz

step 2: done alignment/mapping of the reads to reference human transcriptome file using minimap2. I used a transcriptomic reference file, not a genome reference file. I download the transcriptome reference file from the Ensemble database. The file name of the transcriptomic reference is Homo_sapiens.GRCh38.cdna.all.fa

below is my code-

!/bin/bash

PBS -A UQ-QBI

PBS -l select=1:ncpus=10:mem=40GB

PBS -l walltime=10:00:00

PBS -N minimap_wt-rep1

cd $PBS_O_WORKDIR

module load minimap2/2.24 module load samtools

minimap2 -ax map-ont -uf -t 3 --secondary=no /scratch/project_mnt/S0077/xPore/rawdata/Homo_sapiens.GRCh38.cdna.all.fa \ /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/fastq/*.fastq.gz > /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sam \

samtools sort /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sam -o /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sort.bam samtools index /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sort.bam

step 3- Resquiggle using nanopolish eventalign:

module load nanopolish/0.14.0

nanopolish index -d /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/fast5/ /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/fastq/HEK293T-WT-rep1.fastq.gz nanopolish eventalign --reads /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/fastq/HEK293T-WT-rep1.fastq.gz --bam /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sort.bam --genome /scratch/project_mnt/S0077/xPore/rawdata/Homo_sapiens.GRCh38.cdna.all.fa --signal-index --scale-events --summary /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/nanopolish_out/summary.txt --threads 32 > /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/nanopolish_out/eventalign.txt

My QUERY-

I saw the size of eventalign.txt and summary.txt are 16.1 GB and 23.4 MB after running nanopolish on HEK293T-WT-rep1 data, which are quite higher than the eventalign.index file of HEK293T-WT-rep1.tar.gz a folder which I had downloaded from https://zenodo.org/record/5103099#.YyPJLLRBw2w

I'm sorry for taking your precious time. I will be very grateful if you kindly advise whether it is normal or if I'm doing any mistake while running nanopolish.

jonathangoeke commented 2 years ago

Hi @rezarahman12 I think the size is as expected, the eventalign.txt is a large file. The data from Zenodo is a index file, which is a different and much smaller file. Thanks! Jonathan

rezarahman12 commented 2 years ago

Thank you for your kind reply.