Nextomics / NextPolish

Fast and accurately polish the genome generated by long reads.
GNU General Public License v3.0
205 stars 28 forks source link

repeat regions #111

Open cfz1998 opened 1 year ago

cfz1998 commented 1 year ago

There are two repeat regions in my contig. And nextPolish does not polish this region. The IGV picture using nano_reads mapping to the polished genome. image

It's also not polished for pilon. Why cause this? Because of the not unique-alignment in bwa-alignment for Illumina-reads?

cfz1998 commented 1 year ago

Running code:

#Set input and parameters
round=2
threads=20
read1=../00.raw_data/SRR12578435_R1.fastq
read2=../00.raw_data/SRR12578435_R2.fastq
input=../01.nextDenovo/01_rundir/03.ctg_graph/nd.asm.fasta
for ((i=1; i<=${round};i++)); do
#step 1:
   #index the genome file and do alignment
   bwa index ${input};
   bwa mem -t ${threads} ${input} ${read1} ${read2}|samtools view --threads 3 -F 0x4 -b -|samtools fixmate -m --threads 3  - -|samtools sort -m 2g --threads 5 -|samtools markdup --threads 5 -r - sgs.sort.bam
   #index bam and genome files
   samtools index -@ ${threads} sgs.sort.bam;
   samtools faidx ${input};
   #polish genome file
   /data/chaofan/software/NextPolish/lib/nextpolish1.py -g ${input} -t 1 -p ${threads} -s sgs.sort.bam > genome.polishtemp.fa;
   input=genome.polishtemp.fa;
#step2:
   #index genome file and do alignment
   bwa index ${input};
   bwa mem -t ${threads} ${input} ${read1} ${read2}|samtools view --threads 3 -F 0x4 -b -|samtools fixmate -m --threads 3  - -|samtools sort -m 2g --threads 5 -|samtools markdup --threads 5 -r - sgs.sort.bam
   #index bam and genome files
   samtools index -@ ${threads} sgs.sort.bam;
   samtools faidx ${input};
   #polish genome file
   /data/chaofan/software/NextPolish/lib/nextpolish1.py -g ${input} -t 2 -p ${threads} -s sgs.sort.bam > genome.nextpolish.fa;
   input=genome.nextpolish.fa;
done;
#Finally polished genome file: genome.nextpolish.fa
cfz1998 commented 1 year ago

Thank you for your reply!

moold commented 1 year ago

Check the maping quality and wether these alignments are primary alignments?

cfz1998 commented 1 year ago

Repeat region1 image Repeat region2 image Hi! @moold. Sorry for my late reply. There are many breakpoints (for Illumina reads) in this region. I don‘t know how to check the primary alignments.

moold commented 1 year ago

Nextpolish use the 0xC04 flag to filter alignments, so you I think these alignments should have these flags。