SJTU-CGM / HUPAN

Human pan-genome analysis pipeline
http://cgm.sjtu.edu.cn/hupan/
29 stars 6 forks source link

Missing *.scafSeq file in the assemble_results Directory #9

Closed sebetso388 closed 4 years ago

sebetso388 commented 4 years ago

Good day Dr Duan..

I have been testing the HUPAN pipeline using the provided ExampleData (the plan is to analyze illumina PE WGS) and I am running into an issue;

when i run the 'hupan assemble sga' there is no *.scafSeq file in the assemble_results directory.

When continuing to run the other commands up to hupan assemSta, the contigs_reports and basic_stats subdirectories (of quast_results) are empty. As a result the next command (hupan getUnalnCtg) produces an error: cannot find unaligned info file(*.unaligned.info)

Thanks Gs Sebetso Baylor College of Medicine

zhqduan commented 4 years ago

Hi Sebetso,

Thanks for using and testing HUPAN! Actually, SGA does not produce the sequence files with the suffix of 'scafSeq'. If you run success of 'human assemble sga', you can find the construction of assemble result directory as follow: └── data ├── sample1 │   ├── sample1.assemble-contigs.fa │   ├── sample1.assemble-graph.asqg.gz │   ├── sample1.assemble-variants.fa │   └── sample1.correct.filter.pass.merged.rmdup.asqg.gz ├── sample2 │   ├── sample2.assemble-contigs.fa │   ├── sample2.assemble-graph.asqg.gz │   ├── sample2.assemble-variants.fa │   └── sample2.correct.filter.pass.merged.rmdup.asqg.gz └── sample3 ├── sample3.assemble-contigs.fa ├── sample3.assemble-graph.asqg.gz ├── sample3.assemble-variants.fa └── sample3.correct.filter.pass.merged.rmdup.asqg.gz And the files of *contigs.fa are the assembled sequence files and could be used for further analysis by 'hupan assemSta'. If you could not get the above files, please find that whether the SGA is installed successfully.

Zhongqu

sebetso388 commented 4 years ago

Good day,

Thanks a lot for the prompt respose. I have all those files; it means SGA is working perfectly.

however, hupan assemSta produces empty contigs_reports and basic_stats. As a result hupan getUnalnCtg gives the following: Process sample sample1 ... Warnings: cannot find unaligned info file(.unaligned.info) in quast_result/data/sample1/contigs_reports/ Process sample sample2 ... Warnings: cannot find unaligned info file(.unaligned.info) in quast_result/data/sample2/contigs_reports/ Process sample sample3 ... Warnings: cannot find unaligned info file(*.unaligned.info) in quast_result/data/sample3/contigs_reports/ cat: -: Bad file descriptor cat: -: Bad file descriptor cat: -: Bad file descriptor

zhqduan commented 4 years ago

I think you have not run hupan assemSta successfully. Could you please to check this step, or paste the command and log here? Thank you.

Zhongqu

sebetso388 commented 4 years ago

Good day again, I ran the following commands:

nohup hupan alignContig assembly_results/data/ aligned_result /SOFTWARE/MUMmer3.23/ ref/chr22.fa 2> align.err &

nohup hupan extractSeq assembly_results/data/ candidate aligned_result 2> candidate.err &

nohup hupan assemSta candidate/data/ quast_result /SOFTWARE/HUPAN/tools/quast-4.5/ ref/chr22.fa 2> quast.err &

~/ExampleData/quast_result/data/sample1 drwxrwx---. 2 sebetso bcm--all 0 Aug 13 11:35 basic_stats drwxrwx---. 2 sebetso bcm--all 0 Aug 13 11:35 contigs_reports -rwxrwx---. 1 sebetso bcm--all 53K Aug 13 11:35 icarus.html drwxrwx---. 2 sebetso bcm--all 41 Aug 13 11:35 icarus_viewers -rwxrwx---. 1 sebetso bcm--all 2.9K Aug 13 11:35 quast.log -rwxrwx---. 1 sebetso bcm--all 371K Aug 13 11:35 report.html -rwxrwx---. 1 sebetso bcm--all 1.3K Aug 13 11:35 report.tex -rwxrwx---. 1 sebetso bcm--all 547 Aug 13 11:35 report.tsv -rwxrwx---. 1 sebetso bcm--all 1.6K Aug 13 11:35 report.txt -rwxrwx---. 1 sebetso bcm--all 1.1K Aug 13 11:35 transposed_report.tex -rwxrwx---. 1 sebetso bcm--all 547 Aug 13 11:35 transposed_report.tsv -rwxrwx---. 1 sebetso bcm--all 1.1K Aug 13 11:35 transposed_report.txt

~/ExampleData/quast_result/data/sample1/quast.log

/SOFTWARE/HUPAN/tools/quast-4.5/quast.py --eukaryote -t 1 --min-contig 500 -o quast_result/data/sample1 --no-plots -R ref/chr22.fa candidate/data/sample1/sample1.candidate.unaligned.contig

Version: 4.5, 15ca3b9

System information: OS: Linux-3.10.0-1062.9.1.el7.x86_64-x86_64-with-redhat-7.8-Maipo (linux_64) Python version: 2.7.5 CPUs number: 12

Started: 2020-08-13 11:35:26

Logging to /cnrcseq_home/sebetso/ExampleData/quast_result/data/sample1/quast.log NOTICE: Output directory already exists. Existing Nucmer alignments can be used

CWD: /cnrcseq_home/sebetso/ExampleData Main parameters: Threads: 1, eukaryotic: true, minimum contig length: 500, ambiguity: one, threshold for extensive misassembly size: 1000

Reference: ref/chr22.fa ==> chr22

Contigs: Pre-processing... candidate/data/sample1/sample1.candidate.unaligned.contig ==> sample1.candidate.unaligned

2020-08-13 11:35:33 Running Basic statistics processor... Reference genome: chr22.fa, length = 50818468, num fragments = 1, GC % = 47.00 Contig files: sample1.candidate.unaligned Calculating N50 and L50... sample1.candidate.unaligned, N50 = 773, L50 = 55, Total length = 115250, GC % = 36.91, # N's per 100 kbp = 0.00 Done.

2020-08-13 11:35:36 quast.log

zhqduan commented 4 years ago

It seems your commands are no problem. The issue may occur due to the incomplete install of quest-4.5. quast-4.5 used nucmer tools of MUMmer software to align sequences to reference genome. Please check the directory 'quast-4.5/quast_libs/MUMmer' and make sure the executable file nucmer, e-mem and so on are existed and could be run. Otherwise, please install QAUST according to QUAST manual.

Zhongqu