alekseyzimin / masurca

GNU General Public License v3.0
236 stars 35 forks source link

Chromosome scaffolder and polishing tool #140

Open ghost opened 4 years ago

ghost commented 4 years ago

Dear Professor Zimin, I am using the two new features of MaSuRCA-3.3.4 and I have encountered a couple of issues. First of all, while using the masurca-polishing tool I was using full paths to execute the command and the following error was displayed:

[Wed Oct 30 13:56:21 EET 2019] Creating BWA index for /home/mgabriel/Downloads/data/DroVir2/flye/assembly.fasta
[Wed Oct 30 14:00:28 EET 2019] Aligning reads to /home/mgabriel/Downloads/data/DroVir2/flye/assembly.fasta
[Wed Oct 30 15:55:54 EET 2019] Sorting and indexing alignment file
[Wed Oct 30 16:58:57 EET 2019] Calling variants
Processing 45 scaffold(s) per batch
could not open ..//home/mgabriel/Downloads/data/DroVir2/flye/assembly.fasta
could not open ..//home/mgabriel/Downloads/data/DroVir2/flye/assembly.fasta
.
.
.
could not open ..//home/mgabriel/Downloads/data/DroVir2/flye/assembly.fasta
[Wed Oct 30 16:58:57 EET 2019] Variant calling failed on batch 1 in assembly.fasta.work

I managed to work pass this problem by creating soft links in the directory I was working in.

Second, I tried to use the chromosome scaffolder tool by running the following command:

/opt/MaSuRCA-3.3.4/bin/chromosome_scaffolder.sh \
-r './GCF_000005245.1_dvir_caf1_genomic.fna' \
-q './assembly.fasta.fixed' \
-t 16 \
-i \
-m \
-v \
-s './SRR7167958_1.fastq.gz'\
-cl 3 \
-ch 52

The output was:

+ shift
+ [[ 5 > 0 ]]
+ key=-s
+ case $key in
+ READS=./SRR7167958_1.fastq.gz-cl
+ shift
+ shift
+ [[ 3 > 0 ]]
+ key=3
+ case $key in
+ echo 'Unknown option 3'
Unknown option 3
+ exit 1

Afterwards, I run the tool without the -m option and the output was just Unknown option 3 Finally after using the following command:

/opt/MaSuRCA-3.3.4/bin/chromosome_scaffolder.sh \
-r './GCF_000005245.1_dvir_caf1_genomic.fna' \
-q './assembly.fasta.fixed'

The output was:

[Thu Oct 31 20:57:40 EET 2019] Splitting query scaffolds into contigs
[Thu Oct 31 20:57:41 EET 2019] Mapping reads to query contigs
[INFO] 2019-10-31T20:57:42 [blasr] started.
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted

What should I change in order to run the tool successfully? Which are the default values of -t, -m and -v? Thank you in advance!

Kind regards, Marios

ghost commented 4 years ago

@alekseyzimin any idea? We are still trying to figure it out. We keep getting the same errors.

alekseyzimin commented 4 years ago

Hello, The default for -t is likely 1. Not a good setting, you should use the number of cores (16 or 32?). Your first run failed because -i and -m options need numeric values.
/opt/MaSuRCA-3.3.4/bin/chromosome_scaffolder.sh -r ./GCF_000005245.1_dvir_caf1_genomic.fna -q ./assembly.fasta.fixed -t 32 -i 95 -m 100000

should work. --Aleksey

ghost commented 4 years ago

Dear @alekseyzimin, thank you for your response. I run the scaffolder with the following settings: /opt/MaSuRCA-3.3.4/bin/chromosome_scaffolder.sh -r ./GCF_000005245.1_dvir_caf1_genomic.fna -q ./assembly.fasta.fixed -t 16 -i 95 -m 100000 -v -s ./SRR7167958_1.fastq.gz -cl 3 -ch 64 and the output was:

+ shift
+ [[ 6 > 0 ]]
+ key=-s
+ case $key in
+ READS=SRR7167958_1.fastq.gz
+ shift
+ shift
+ [[ 4 > 0 ]]
+ key=-cl
+ case $key in
+ COV_THRESH=3
+ shift
+ shift
+ [[ 2 > 0 ]]
+ key=-ch
+ case $key in
+ REP_COV_THRESH=64
+ shift
+ shift
+ [[ 0 > 0 ]]
++ basename GCF_000005245.1_dvir_caf1_genomic.fna
+ REF_CHR=GCF_000005245.1_dvir_caf1_genomic.fna
++ basename assembly.fasta.fixed
+ HYB_CTG=assembly.fasta.fixed.split
+ HYB_POS=assembly.fasta.fixed.split.posmap
+ rm -rf .rerun
+ '[' '!' -s assembly.fasta.fixed.split ']'
+ log 'Splitting query scaffolds into contigs'
++ date
+ dddd='Thu Nov 14 16:19:33 EET 2019'
+ echo -e '\e[0;32m[Thu Nov 14 16:19:33 EET 2019]\e[0m Splitting query scaffolds into contigs'
[Thu Nov 14 16:19:33 EET 2019] Splitting query scaffolds into contigs
+ /opt/MaSuRCA-3.3.4/bin/splitFileAtNs assembly.fasta.fixed 1
+ touch .rerun
+ '[' '!' -s assembly.fasta.fixed.split.posmap ']'
+ log 'Mapping reads to query contigs'
++ date
+ dddd='Thu Nov 14 16:19:34 EET 2019'
+ echo -e '\e[0;32m[Thu Nov 14 16:19:34 EET 2019]\e[0m Mapping reads to query contigs'
[Thu Nov 14 16:19:34 EET 2019] Mapping reads to query contigs
+ /opt/MaSuRCA-3.3.4/bin/../CA8/Linux-amd64/bin/blasr -nproc 16 -bestn 1 SRR7167958_1.fastq.gz assembly.fasta.fixed.split
+ awk '{if(($11-$10)/$12>0.75){if($4==0) print $1" "substr($2,4)" "$7" "$8" f"; else print  $1" "substr($2,4)" "$9-$8" "$9-$7" r"}}'
+ sort -nk2 -k3n -S 10%
[INFO] 2019-11-14T16:19:34 [blasr] started.
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted

Should I change anything else in my options?

alekseyzimin commented 4 years ago

Ah, here is the problem-- your reads are gzipped and thus blasr fails. Unzip them and it will work.

On Thu, Nov 14, 2019 at 9:35 AM Marios Gavrielatos notifications@github.com wrote:

Dear @alekseyzimin https://github.com/alekseyzimin, thank you for your response. I run the scaffolder with the following settings: /opt/MaSuRCA-3.3.4/bin/chromosome_scaffolder.sh -r ./GCF_000005245.1_dvir_caf1_genomic.fna -q ./assembly.fasta.fixed -t 16 -i 95 -m 100000 -v -s ./SRR7167958_1.fastq.gz -cl 3 -ch 64 and the output was:

  • shift
  • [[ 6 > 0 ]]
  • key=-s
  • case $key in
  • READS=SRR7167958_1.fastq.gz
  • shift
  • shift
  • [[ 4 > 0 ]]
  • key=-cl
  • case $key in
  • COV_THRESH=3
  • shift
  • shift
  • [[ 2 > 0 ]]
  • key=-ch
  • case $key in
  • REP_COV_THRESH=64
  • shift
  • shift
  • [[ 0 > 0 ]] ++ basename GCF_000005245.1_dvir_caf1_genomic.fna
  • REF_CHR=GCF_000005245.1_dvir_caf1_genomic.fna ++ basename assembly.fasta.fixed
  • HYB_CTG=assembly.fasta.fixed.split
  • HYB_POS=assembly.fasta.fixed.split.posmap
  • rm -rf .rerun
  • '[' '!' -s assembly.fasta.fixed.split ']'
  • log 'Splitting query scaffolds into contigs' ++ date
  • dddd='Thu Nov 14 16:19:33 EET 2019'
  • echo -e '\e[0;32m[Thu Nov 14 16:19:33 EET 2019]\e[0m Splitting query scaffolds into contigs' [Thu Nov 14 16:19:33 EET 2019] Splitting query scaffolds into contigs
  • /opt/MaSuRCA-3.3.4/bin/splitFileAtNs assembly.fasta.fixed 1
  • touch .rerun
  • '[' '!' -s assembly.fasta.fixed.split.posmap ']'
  • log 'Mapping reads to query contigs' ++ date
  • dddd='Thu Nov 14 16:19:34 EET 2019'
  • echo -e '\e[0;32m[Thu Nov 14 16:19:34 EET 2019]\e[0m Mapping reads to query contigs' [Thu Nov 14 16:19:34 EET 2019] Mapping reads to query contigs
  • /opt/MaSuRCA-3.3.4/bin/../CA8/Linux-amd64/bin/blasr -nproc 16 -bestn 1 SRR7167958_1.fastq.gz assembly.fasta.fixed.split
  • awk '{if(($11-$10)/$12>0.75){if($4==0) print $1" "substr($2,4)" "$7" "$8" f"; else print $1" "substr($2,4)" "$9-$8" "$9-$7" r"}}'
  • sort -nk2 -k3n -S 10% [INFO] 2019-11-14T16:19:34 [blasr] started. awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted

Should I change something else in my options?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/140?email_source=notifications&email_token=AGPXGHJMFPDJME24AW2CAJ3QTVO2LA5CNFSM4JHQY2LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEECBCHY#issuecomment-553914655, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHKOUN3KAUDO45IIK4TQTVO2LANCNFSM4JHQY2LA .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

alekseyzimin commented 4 years ago

I will add checking for reads file extension in the next release. Unfortunately, it is not easy to accept gzipped files automatically, because I cannot pipe reads into blasr, it actually looks at file extension.

On Thu, Nov 14, 2019 at 10:08 AM Aleksey Zimin aleksey.zimin@gmail.com wrote:

Ah, here is the problem-- your reads are gzipped and thus blasr fails. Unzip them and it will work.

On Thu, Nov 14, 2019 at 9:35 AM Marios Gavrielatos < notifications@github.com> wrote:

Dear @alekseyzimin https://github.com/alekseyzimin, thank you for your response. I run the scaffolder with the following settings: /opt/MaSuRCA-3.3.4/bin/chromosome_scaffolder.sh -r ./GCF_000005245.1_dvir_caf1_genomic.fna -q ./assembly.fasta.fixed -t 16 -i 95 -m 100000 -v -s ./SRR7167958_1.fastq.gz -cl 3 -ch 64 and the output was:

  • shift
  • [[ 6 > 0 ]]
  • key=-s
  • case $key in
  • READS=SRR7167958_1.fastq.gz
  • shift
  • shift
  • [[ 4 > 0 ]]
  • key=-cl
  • case $key in
  • COV_THRESH=3
  • shift
  • shift
  • [[ 2 > 0 ]]
  • key=-ch
  • case $key in
  • REP_COV_THRESH=64
  • shift
  • shift
  • [[ 0 > 0 ]] ++ basename GCF_000005245.1_dvir_caf1_genomic.fna
  • REF_CHR=GCF_000005245.1_dvir_caf1_genomic.fna ++ basename assembly.fasta.fixed
  • HYB_CTG=assembly.fasta.fixed.split
  • HYB_POS=assembly.fasta.fixed.split.posmap
  • rm -rf .rerun
  • '[' '!' -s assembly.fasta.fixed.split ']'
  • log 'Splitting query scaffolds into contigs' ++ date
  • dddd='Thu Nov 14 16:19:33 EET 2019'
  • echo -e '\e[0;32m[Thu Nov 14 16:19:33 EET 2019]\e[0m Splitting query scaffolds into contigs' [Thu Nov 14 16:19:33 EET 2019] Splitting query scaffolds into contigs
  • /opt/MaSuRCA-3.3.4/bin/splitFileAtNs assembly.fasta.fixed 1
  • touch .rerun
  • '[' '!' -s assembly.fasta.fixed.split.posmap ']'
  • log 'Mapping reads to query contigs' ++ date
  • dddd='Thu Nov 14 16:19:34 EET 2019'
  • echo -e '\e[0;32m[Thu Nov 14 16:19:34 EET 2019]\e[0m Mapping reads to query contigs' [Thu Nov 14 16:19:34 EET 2019] Mapping reads to query contigs
  • /opt/MaSuRCA-3.3.4/bin/../CA8/Linux-amd64/bin/blasr -nproc 16 -bestn 1 SRR7167958_1.fastq.gz assembly.fasta.fixed.split
  • awk '{if(($11-$10)/$12>0.75){if($4==0) print $1" "substr($2,4)" "$7" "$8" f"; else print $1" "substr($2,4)" "$9-$8" "$9-$7" r"}}'
  • sort -nk2 -k3n -S 10% [INFO] 2019-11-14T16:19:34 [blasr] started. awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted

Should I change something else in my options?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/140?email_source=notifications&email_token=AGPXGHJMFPDJME24AW2CAJ3QTVO2LA5CNFSM4JHQY2LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEECBCHY#issuecomment-553914655, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHKOUN3KAUDO45IIK4TQTVO2LANCNFSM4JHQY2LA .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

ghost commented 4 years ago

Dear @alekseyzimin, I unzipped the file as you suggested and it worked. The scaffolding stopped after the following steps:

[Fri Nov 15 21:20:54 EET 2019] Splitting query scaffolds into contigs
+ /opt/MaSuRCA-3.3.4/bin/splitFileAtNs assembly.fasta.fixed 1
+ touch .rerun
+ '[' '!' -s assembly.fasta.fixed.split.posmap ']'
+ log 'Mapping reads to query contigs'
++ date
+ dddd='Fri Nov 15 21:20:57 EET 2019'
+ echo -e '\e[0;32m[Fri Nov 15 21:20:57 EET 2019]\e[0m Mapping reads to query contigs'
[Fri Nov 15 21:20:57 EET 2019] Mapping reads to query contigs
+ /opt/MaSuRCA-3.3.4/bin/../CA8/Linux-amd64/bin/blasr -nproc 16 -bestn 1 SRR7167958_1.fastq assembly.fasta.fixed.split
+ awk '{if(($11-$10)/$12>0.75){if($4==0) print $1" "substr($2,4)" "$7" "$8" f"; else print  $1" "substr($2,4)" "$9-$8" "$9-$7" r"}}'
+ sort -nk2 -k3n -S 10%
[INFO] 2019-11-15T21:20:57 [blasr] started.
[INFO] 2019-11-16T02:27:21 [blasr] ended.
+ touch .rerun
+ '[' '!' -s GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.delta ']'
+ log 'Aligning query contigs to reference scaffolds'
++ date
+ dddd='Sat Nov 16 02:27:26 EET 2019'
+ echo -e '\e[0;32m[Sat Nov 16 02:27:26 EET 2019]\e[0m Aligning query contigs to reference scaffolds'
[Sat Nov 16 02:27:26 EET 2019] Aligning query contigs to reference scaffolds
+ /opt/MaSuRCA-3.3.4/bin/nucmer -t 16 -p GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split -c 200 GCA_007989325.1_vir160_genomic.fna assembly.fasta.fixed.split
+ touch .rerun
+ '[' '!' -s GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.1.delta ']'
+ log 'Filtering the alignments'
++ date
+ dddd='Sat Nov 16 02:30:02 EET 2019'
+ echo -e '\e[0;32m[Sat Nov 16 02:30:02 EET 2019]\e[0m Filtering the alignments'
[Sat Nov 16 02:30:02 EET 2019] Filtering the alignments
+ /opt/MaSuRCA-3.3.4/bin/delta-filter -1 -i 95 -o 20 GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.delta
+ touch .rerun
+ '[' '!' -s assembly.fasta.fixed.split.posmap.coverage ']'
+ log 'Computing read coverage for query contigs'
++ date
+ dddd='Sat Nov 16 02:30:03 EET 2019'
+ echo -e '\e[0;32m[Sat Nov 16 02:30:03 EET 2019]\e[0m Computing read coverage for query contigs'
[Sat Nov 16 02:30:03 EET 2019] Computing read coverage for query contigs
+ awk '{print $1" "$2" "$3"\n"$1" "$2" "$4}' assembly.fasta.fixed.split.posmap
+ grep -v F
+ grep -v R
+ sort -nk2 -k3n -S 10%
+ /opt/MaSuRCA-3.3.4/bin/compute_coverage.pl

The output files are:

   0 Nov 16 02:30 assembly.fasta.fixed.split.posmap.coverage
4.6M Nov 16 02:30 GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.1.delta
   0 Nov 16 02:30 .rerun
6.5M Nov 16 02:30 GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.delta
 42M Nov 16 02:27 assembly.fasta.fixed.split.posmap
156M Nov 15 21:20 assembly.fasta.fixed.split
 22K Nov 15 21:20 genome.asm
 28K Nov 15 21:20 genome.posmap.ctgscf
 20K Nov 15 21:20 scaffNameTranslations.txt
alekseyzimin commented 4 years ago

Having failure here would be unusual. Can you post a few lines (head -n 5) from the assembly.fasta.fixed.split.posmap file?

On Sat, Nov 16, 2019 at 2:02 AM Marios Gavrielatos notifications@github.com wrote:

Dear @alekseyzimin https://github.com/alekseyzimin, I unzipped the file as you suggested and it worked. The scaffolding stopped after the following steps:

[Fri Nov 15 21:20:54 EET 2019] Splitting query scaffolds into contigs

  • /opt/MaSuRCA-3.3.4/bin/splitFileAtNs assembly.fasta.fixed 1
  • touch .rerun
  • '[' '!' -s assembly.fasta.fixed.split.posmap ']'
  • log 'Mapping reads to query contigs' ++ date
  • dddd='Fri Nov 15 21:20:57 EET 2019'
  • echo -e '\e[0;32m[Fri Nov 15 21:20:57 EET 2019]\e[0m Mapping reads to query contigs' [Fri Nov 15 21:20:57 EET 2019] Mapping reads to query contigs
  • /opt/MaSuRCA-3.3.4/bin/../CA8/Linux-amd64/bin/blasr -nproc 16 -bestn 1 SRR7167958_1.fastq assembly.fasta.fixed.split
  • awk '{if(($11-$10)/$12>0.75){if($4==0) print $1" "substr($2,4)" "$7" "$8" f"; else print $1" "substr($2,4)" "$9-$8" "$9-$7" r"}}'
  • sort -nk2 -k3n -S 10% [INFO] 2019-11-15T21:20:57 [blasr] started. [INFO] 2019-11-16T02:27:21 [blasr] ended.
  • touch .rerun
  • '[' '!' -s GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.delta ']'
  • log 'Aligning query contigs to reference scaffolds' ++ date
  • dddd='Sat Nov 16 02:27:26 EET 2019'
  • echo -e '\e[0;32m[Sat Nov 16 02:27:26 EET 2019]\e[0m Aligning query contigs to reference scaffolds' [Sat Nov 16 02:27:26 EET 2019] Aligning query contigs to reference scaffolds
  • /opt/MaSuRCA-3.3.4/bin/nucmer -t 16 -p GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split -c 200 GCA_007989325.1_vir160_genomic.fna assembly.fasta.fixed.split
  • touch .rerun
  • '[' '!' -s GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.1.delta ']'
  • log 'Filtering the alignments' ++ date
  • dddd='Sat Nov 16 02:30:02 EET 2019'
  • echo -e '\e[0;32m[Sat Nov 16 02:30:02 EET 2019]\e[0m Filtering the alignments' [Sat Nov 16 02:30:02 EET 2019] Filtering the alignments
  • /opt/MaSuRCA-3.3.4/bin/delta-filter -1 -i 95 -o 20 GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.delta
  • touch .rerun
  • '[' '!' -s assembly.fasta.fixed.split.posmap.coverage ']'
  • log 'Computing read coverage for query contigs' ++ date
  • dddd='Sat Nov 16 02:30:03 EET 2019'
  • echo -e '\e[0;32m[Sat Nov 16 02:30:03 EET 2019]\e[0m Computing read coverage for query contigs' [Sat Nov 16 02:30:03 EET 2019] Computing read coverage for query contigs
  • awk '{print $1" "$2" "$3"\n"$1" "$2" "$4}' assembly.fasta.fixed.split.posmap
  • grep -v F
  • grep -v R
  • sort -nk2 -k3n -S 10%
  • /opt/MaSuRCA-3.3.4/bin/compute_coverage.pl

The output files are:

0 Nov 16 02:30 assembly.fasta.fixed.split.posmap.coverage 4.6M Nov 16 02:30 GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.1.delta 0 Nov 16 02:30 .rerun 6.5M Nov 16 02:30 GCA_007989325.1_vir160_genomic.fna.assembly.fasta.fixed.split.delta 42M Nov 16 02:27 assembly.fasta.fixed.split.posmap 156M Nov 15 21:20 assembly.fasta.fixed.split 22K Nov 15 21:20 genome.asm 28K Nov 15 21:20 genome.posmap.ctgscf 20K Nov 15 21:20 scaffNameTranslations.txt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/140?email_source=notifications&email_token=AGPXGHNWIU7S6T5LV354FB3QT6LIFA5CNFSM4JHQY2LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEHLHFQ#issuecomment-554611606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHIJTK5LL7OXUIGNLSDQT6LIFANCNFSM4JHQY2LA .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

ghost commented 4 years ago

These are the first 10 lines from the assembly.fasta.fixed.split.posmap file:

SRR7167958.1059890 7180000000000 449 6604 r
SRR7167958.460610 7180000000000 450 2527 f
SRR7167958.354243 7180000000000 916 5839 r
SRR7167958.950193 7180000000000 1042 5778 r
SRR7167958.1208632 7180000000000 1257 6258 f
SRR7167958.412011 7180000000000 1510 5774 r
SRR7167958.235508 7180000000000 1551 7287 r
SRR7167958.363957 7180000000000 1833 6025 r
SRR7167958.357056 7180000000000 2061 5779 f
SRR7167958.1092340 7180000000000 2364 13203 r
alekseyzimin commented 4 years ago

This is exactly what i was thinking. There is a bug/feature that is aimed at filtering out short reads. But if you only have short reads, the scaffolding will fail.

On Mon, Nov 18, 2019, 11:49 AM Marios Gavrielatos notifications@github.com wrote:

These are the first 10 lines from the assembly.fasta.fixed.split.posmap file:

SRR7167958.1059890 7180000000000 449 6604 r SRR7167958.460610 7180000000000 450 2527 f SRR7167958.354243 7180000000000 916 5839 r SRR7167958.950193 7180000000000 1042 5778 r SRR7167958.1208632 7180000000000 1257 6258 f SRR7167958.412011 7180000000000 1510 5774 r SRR7167958.235508 7180000000000 1551 7287 r SRR7167958.363957 7180000000000 1833 6025 r SRR7167958.357056 7180000000000 2061 5779 f SRR7167958.1092340 7180000000000 2364 13203 r

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/140?email_source=notifications&email_token=AGPXGHLV5BEJ4Z6OEF4CJLLQULBSDA5CNFSM4JHQY2LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEELDSVY#issuecomment-555104599, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHI6TTD5RV2WKHJRSWTQULBSDANCNFSM4JHQY2LA .

alekseyzimin commented 4 years ago

I just pushed version 3.3.5beta where the bug is fixed -- it amounts to simply removing "grep -v F | grep-v R from line in chromosome_scaffolder.sh. You can get this verison here:

https://github.com/alekseyzimin/masurca/blob/master/MaSuRCA-3.3.5b.tar.gz

Note that the new version of chromosome scaffolder is not compatible with the directory structure of the previous version -- you need to run it in the new folder.

To quickly fix your failure you can just remove grep -v F |grep -v R | from line 125 in MaSuRCA-3.3.4/bin/chromosome_scaffolder.sh and re-run after deleting assembly.fasta.fixed.split.posmap.coverage file.

--Aleksey

On Mon, Nov 18, 2019 at 12:08 PM Aleksey Zimin aleksey.zimin@gmail.com wrote:

This is exactly what i was thinking. There is a bug/feature that is aimed at filtering out short reads. But if you only have short reads, the scaffolding will fail.

On Mon, Nov 18, 2019, 11:49 AM Marios Gavrielatos < notifications@github.com> wrote:

These are the first 10 lines from the assembly.fasta.fixed.split.posmap file:

SRR7167958.1059890 7180000000000 449 6604 r SRR7167958.460610 7180000000000 450 2527 f SRR7167958.354243 7180000000000 916 5839 r SRR7167958.950193 7180000000000 1042 5778 r SRR7167958.1208632 7180000000000 1257 6258 f SRR7167958.412011 7180000000000 1510 5774 r SRR7167958.235508 7180000000000 1551 7287 r SRR7167958.363957 7180000000000 1833 6025 r SRR7167958.357056 7180000000000 2061 5779 f SRR7167958.1092340 7180000000000 2364 13203 r

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/140?email_source=notifications&email_token=AGPXGHLV5BEJ4Z6OEF4CJLLQULBSDA5CNFSM4JHQY2LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEELDSVY#issuecomment-555104599, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHI6TTD5RV2WKHJRSWTQULBSDANCNFSM4JHQY2LA .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

ghost commented 4 years ago

Dear @alekseyzimin, I did as you suggested, I removed grep -v F |grep -v R | and the scaffolding finished successfully.

Thank you very much for your help!

alekseyzimin commented 4 years ago

You are welcome! Thank you for identifying this problem!

On Mon, Nov 18, 2019 at 12:47 PM Marios Gavrielatos < notifications@github.com> wrote:

Dear @alekseyzimin https://github.com/alekseyzimin, I did as you suggested, I removed grep -v F |grep -v R | and the scaffolding finished successfully.

Thank you very much for your help!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/140?email_source=notifications&email_token=AGPXGHNW2NRJQVFIQALU3T3QULILXA5CNFSM4JHQY2LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEELJWDI#issuecomment-555129613, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHLTOPQCFHLLID44WZTQULILXANCNFSM4JHQY2LA .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

Ameyasanthosh commented 1 year ago

Dear Professor @alekseyzimin I am using the masurca-polishing tool -Chromosome scaffolder with the below command and the following error was displayed:

chromosome_scaffolder.sh -r medref.fna -q mac3.fasta -t 36 [Friday 12 August 2022 11:32:30 AM IST] Computing gap coordinates in the reference [Friday 12 August 2022 11:33:22 AM IST] Splitting query scaffolds at >100bp gaps [Friday 12 August 2022 11:33:40 AM IST] Adding noise to reference to align to duplicated regions [Friday 12 August 2022 11:43:18 AM IST] Mapping reads to query contigs [Friday 12 August 2022 11:43:18 AM IST] Wrong type/extension for the file, must be .fa, .fasta or .fastq

But both my query and reference was in fasta format. Please suggest a solution

iremdnzl commented 1 year ago

Dear Professor @alekseyzimin I am using the masurca-polishing tool -Chromosome scaffolder with the below command and the following error was displayed:

chromosome_scaffolder.sh -r medref.fna -q mac3.fasta -t 36 [Friday 12 August 2022 11:32:30 AM IST] Computing gap coordinates in the reference [Friday 12 August 2022 11:33:22 AM IST] Splitting query scaffolds at >100bp gaps [Friday 12 August 2022 11:33:40 AM IST] Adding noise to reference to align to duplicated regions [Friday 12 August 2022 11:43:18 AM IST] Mapping reads to query contigs [Friday 12 August 2022 11:43:18 AM IST] Wrong type/extension for the file, must be .fa, .fasta or .fastq

But both my query and reference was in fasta format. Please suggest a solution

same issue! how to solve?

rmarquezp commented 1 year ago

Maybe this is because the extension for the extension for the medref.fna file is .fna and not fasta or fa? Try changing it to .fa, that may solve it.

On Tue, Oct 18, 2022 at 3:46 PM iremdnzl @.***> wrote:

Dear Professor @alekseyzimin https://github.com/alekseyzimin I am using the masurca-polishing tool -Chromosome scaffolder with the below command and the following error was displayed:

chromosome_scaffolder.sh -r medref.fna -q mac3.fasta -t 36 [Friday 12 August 2022 11:32:30 AM IST] Computing gap coordinates in the reference [Friday 12 August 2022 11:33:22 AM IST] Splitting query scaffolds at >100bp gaps [Friday 12 August 2022 11:33:40 AM IST] Adding noise to reference to align to duplicated regions [Friday 12 August 2022 11:43:18 AM IST] Mapping reads to query contigs [Friday 12 August 2022 11:43:18 AM IST] Wrong type/extension for the file, must be .fa, .fasta or .fastq

But both my query and reference was in fasta format. Please suggest a solution

same issue! how to solve?

— Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/140#issuecomment-1282919780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHLZEC2FYMA5EVFBKNNGPI3WD347RANCNFSM4JHQY2LA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

iremdnzl commented 1 year ago

i tried, didn't work. i also tried using them in their own folder (./ref.fa) but still the same, not working and giving the same error.

maochuangxue commented 1 year ago

Dear Professor @alekseyzimin I am using Chromosome scaffolder.sh from MaSuRCA-4.0.9 by the command:chromosome_scaffolder.sh -r ../T2T_CHM13.V2.0_GCA_009914755.4_20220403/chm13v2.0.fa -q ../TGS_ZHU/asm/asm_fa/${i}.p_ctg.fa.gz -t 90 -nb -v . And the error is "cat: chm13v2.0.fa.22TF01547.asm.bp.hap1.p_ctg.fa.gz.split.reconciled.txt: No such file or directory". In fact, it has already produced the temporary file "chm13v2.0.fa.22TF01547.asm.bp.hap1.p_ctg.fa.gz.split.reconciled.txt.tmp". Would you help me to find out the problem?