NBISweden / Earth-Biogenome-Project-pilot

Assembly and Annotation workflows for analysing data in the Earth Biogenome Project pilot project.
https://www.earthbiogenome.org/
GNU General Public License v3.0
9 stars 8 forks source link

Assembly summary does not contain any assembly stats (N50, length, base composition, etc) #89

Closed MartinPippel closed 3 weeks ago

MartinPippel commented 4 months ago

Describe the bug Assembly summary does not contain any assembly stats (N50, length, base composition, etc).

+++Assembly summary+++:
# scaffolds: 0
Total scaffold length: 0
Average scaffold length: nan
Scaffold N50: 0
Scaffold auN: 0.00
Scaffold L50: 0
Largest scaffold: 0
Smallest scaffold: 0
# contigs: 0
Total contig length: 0
Average contig length: nan
Contig N50: 0
Contig auN: 0.00
Contig L50: 0
Largest contig: 0
Smallest contig: 0
# gaps in scaffolds: 0

To Reproduce Steps to reproduce the behavior: run the default assembly pipeline

Expected behavior It should look like this:

+++Assembly summary+++:
# scaffolds: 4198
Total scaffold length: 362905281
Average scaffold length: 86447.18
Scaffold N50: 159010
Scaffold auN: 214780.94
Scaffold L50: 651
Largest scaffold: 1887161
Smallest scaffold: 1143
# contigs: 4198
Total contig length: 362905281
Average contig length: 86447.18
Contig N50: 159010
Contig auN: 214780.94
Contig L50: 651
Largest contig: 1887161
Smallest contig: 1143

Why does this happen?

gfastats got a gfa file as input. Don't know why but this produces an incomplete summary file.

Solution

Feed the fasta file (or compressed fasta file) into the GFASTATS module. This solves the issue.

Suggestion

It would be nice to include the --nstar-report flag by default to the gfastats command call. This provides all NX and LX values for the assembly.

mahesh-panchal commented 3 weeks ago

closing as it should have been addressed in the linked PR