Closed rezarahman12 closed 2 months ago
Hi @rezarahman12 I think the size is as expected, the eventalign.txt is a large file. The data from Zenodo is a index file, which is a different and much smaller file. Thanks! Jonathan
Thank you for your kind reply.
Hi there I would like to thank you for developing this beautiful program to analyze nanopore dRNA-seq data. I started from raw data processing before running the xpore, which I tested on the demo data provided.
The size of eventalign.txt and summary.txt is 16.1 GB and 23.4 MB after running nanopolish on HEK293T-WT-rep1 data, which are quite higher than the eventalign.index file of HEK293T-WT-rep1.tar.gz a folder which I had downloaded from https://zenodo.org/record/5103099#.YyPJLLRBw2w
step 1: I downloaded the fast5 files and fastq files of HEK293T-WT-rep1 with the below command- module load axel/2.17.11
axel ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR470/ERR4706156/HEK293T-WT-rep1.tar.gz axel ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR470/ERR4706161/HEK293T-WT-rep1.fastq.gz
When I extract fast5 files using the above-downloaded "HEK293T-WT-rep1.tar.gz" file with the below command. I saw a folder "fast5" which are stored in subfolders/directories named "0", "1", "10", "11",...
tar -zxvf HEK293T-WT-rep1.tar.gz
step 2: done alignment/mapping of the reads to reference human transcriptome file using minimap2. I used a transcriptomic reference file, not a genome reference file. I download the transcriptome reference file from the Ensemble database. The file name of the transcriptomic reference is Homo_sapiens.GRCh38.cdna.all.fa
below is my code-
!/bin/bash
PBS -A UQ-QBI
PBS -l select=1:ncpus=10:mem=40GB
PBS -l walltime=10:00:00
PBS -N minimap_wt-rep1
cd $PBS_O_WORKDIR
module load minimap2/2.24 module load samtools
minimap2 -ax map-ont -uf -t 3 --secondary=no /scratch/project_mnt/S0077/xPore/rawdata/Homo_sapiens.GRCh38.cdna.all.fa \ /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/fastq/*.fastq.gz > /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sam \
samtools sort /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sam -o /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sort.bam samtools index /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sort.bam
step 3- Resquiggle using nanopolish eventalign:
module load nanopolish/0.14.0
nanopolish index -d /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/fast5/ /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/fastq/HEK293T-WT-rep1.fastq.gz nanopolish eventalign --reads /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/fastq/HEK293T-WT-rep1.fastq.gz --bam /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/minimap_out/aligned.sort.bam --genome /scratch/project_mnt/S0077/xPore/rawdata/Homo_sapiens.GRCh38.cdna.all.fa --signal-index --scale-events --summary /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/nanopolish_out/summary.txt --threads 32 > /scratch/project_mnt/S0077/xPore/rawdata/HEK293T-WT-rep1/nanopolish_out/eventalign.txt
My QUERY-
I saw the size of eventalign.txt and summary.txt are 16.1 GB and 23.4 MB after running nanopolish on HEK293T-WT-rep1 data, which are quite higher than the eventalign.index file of HEK293T-WT-rep1.tar.gz a folder which I had downloaded from https://zenodo.org/record/5103099#.YyPJLLRBw2w
I'm sorry for taking your precious time. I will be very grateful if you kindly advise whether it is normal or if I'm doing any mistake while running nanopolish.