Nextomics / NextPolish

Fast and accurately polish the genome generated by long reads.
GNU General Public License v3.0
200 stars 27 forks source link

unable to create files for use for nextpolish.py from a offline method #84

Closed amit4mchiba closed 2 years ago

amit4mchiba commented 2 years ago

Hi,

I am trying to run Nextpolish method offline. I followed the documentation, which suggests running the process in two cycle for short reads, and it is like this-

#Set input and parameters
round=2
threads=20
read1=reads_R1.fastq.gz
read2=reads_R2.fastq.gz
input=input.genome.fa
for ((i=1; i<=${round};i++)); do
#step 1:
   #index the genome file and do alignment
   bwa index ${input};
   bwa mem -t ${threads} ${input} ${read1} ${read2}|samtools view --threads 3 -F 0x4 -b -|samtools fixmate -m --threads 3  - -|samtools sort -m 2g --threads 5 -|samtools markdup --threads 5 -r - sgs.sort.bam
   #index bam and genome files
   samtools index -@ ${threads} sgs.sort.bam;
   samtools faidx ${input};
   #polish genome file
   python NextPolish/lib/nextpolish1.py -g ${input} -t 1 -p ${threads} -s sgs.sort.bam > genome.polishtemp.fa;
   input=genome.polishtemp.fa;
#step2:
   #index genome file and do alignment
   bwa index ${input};
   bwa mem -t ${threads} ${input} ${read1} ${read2}|samtools view --threads 3 -F 0x4 -b -|samtools fixmate -m --threads 3  - -|samtools sort -m 2g --threads 5 -|samtools markdup --threads 5 -r - sgs.sort.bam
   #index bam and genome files
   samtools index -@ ${threads} sgs.sort.bam;
   samtools faidx ${input};
   #polish genome file
   python NextPolish/lib/nextpolish1.py -g ${input} -t 2 -p ${threads} -s sgs.sort.bam > genome.nextpolish.fa;
   input=genome.nextpolish.fa;
done;
#Finally polished genome file: genome.nextpolish.fa

So, I actually named my files as it is (meaning, my genome was named as input.genome.fa, and same for reads). I then tried running the mapping but keep getting error. I then realized that this is due to space required for piping outputs of bwa to samtools, so, I re-run by adding space, and keep getting new errors. One of the error is that no such option as markdup for samtools, another error is no such option as --threads for fixmate and so on.

I am not an expert, and hence, I lack skill to run alignment pipeline based on your script.

I will be grateful if you could advice. By the way, I was able to successfully run text run, so, I think installation has no issues.

Question or Expected behavior I was expecting to run mapping pipeline as mentioned in the document. I wonder if copying and then running as it is the issue.

Operating system Which operating system and version are you using? You can use the command lsb_release -a to get it. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal

GCC What version of GCC are you using? You can use the command gcc -v to get it. Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 6.4.0-17ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-as=/usr/bin/x86_64-linux-gnu-as --with-ld=/usr/bin/x86_64-linux-gnu-ld - -program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-ti me=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 6.4.0 20180424 (Ubuntu 6.4.0-17ubuntu1)

Python What version of Python are you using? You can use the command python --version to get it. Python 2.7.14

NextPolish What version of NextPolish are you using? You can use the command nextPolish -v to get it. nextPolish v1.4.0

Additional context (Optional) Add any other context about the problem here.

thank you so much,

regards Amit

moold commented 2 years ago

Hello, could you try to use NextPolish/bin/samtools to replace the one you used. This may be caused by different versions of samtools.

amit4mchiba commented 2 years ago

Thank you so much for your reply.

I used this script-

#Set input and parameters
round=2
threads=20
read1=./sreads.R1.fastq.gz
read2=./sreads.R2.fastq.gz
input=./raw.genome.fasta
for ((i=1; i<=${round};i++)); do
#step 1:
   #index the genome file and do alignment
   /mnt/HD1/NextPolish/bin/bwa index ${input};
   /mnt/HD1/NextPolish/bin/bwa mem -t ${threads} ${input} ${read1} ${read2}|/mnt/HD1/NextPolish/bin/samtools view --threads 3 -F 0x4 -b -|/mnt/HD1/NextPolish/bin/samtools fixmate -m --threads 3  - -|/mnt/HD1/NextPolish/bin/samtools sort -m 2g --threads 5 -|/mnt/HD1/NextPolish/bin/samtools markdup --threads 5 -r - sgs.sort.bam
   #index bam and genome files
   /mnt/HD1/NextPolish/bin/samtools index -@ ${threads} sgs.sort.bam;
   /mnt/HD1/NextPolish/bin/samtools faidx ${input};
   #polish genome file
   python /mnt/HD1/NextPolish/lib/nextpolish1.py -g ${input} -t 1 -p ${threads} -s sgs.sort.bam > genome.polishtemp.fa;
   input=genome.polishtemp.fa;
#step2:
   #index genome file and do alignment
   /mnt/HD1/NextPolish/bin/bwa index ${input};
   /mnt/HD1/NextPolish/bin/bwa mem -t ${threads} ${input} ${read1} ${read2}|/mnt/HD1/NextPolish/bin/samtools view --threads 3 -F 0x4 -b -|/mnt/HD1/NextPolish/bin/samtools fixmate -m --threads 3  - -|/mnt/HD1/NextPolish/bin/samtools sort -m 2g --threads 5 -|/mnt/HD1/NextPolish/bin/samtools markdup --threads 5 -r - sgs.sort.bam
   #index bam and genome files
   /mnt/HD1/NextPolish/bin/samtools index -@ ${threads} sgs.sort.bam;
   /mnt/HD1/NextPolish/bin/samtools faidx ${input};
   #polish genome file
   python /mnt/HD1/NextPolish/lib/nextpolish1.py -g ${input} -t 2 -p ${threads} -s sgs.sort.bam > genome.nextpolish.fa;
   input=genome.nextpolish.fa;
done;
#Finally polished genome file: genome.nextpolish.fa

But I am getting this error- run_Nextpolishing.sh: 9: Syntax error: Bad for loop variable

I am not sure why this error. I am so sorry for asking such stupid questions, but I am not able to figure out as how to do this. Your advice will be highly appreciated.

moold commented 2 years ago

Actually , I suggest you follow here to run NextPolish, which is more easier.

Regarding your question, try bash run_Nextpolishing.sh

amit4mchiba commented 2 years ago

I get it.

Thank you so much. I was able to run it, and indeed, using Nextpolish samtools worked. Also, test run worked when submitted through bash.

Many thanks.