ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

n50 calculation using bbtools or other tool? #79

Open rotoscan opened 3 years ago

rotoscan commented 3 years ago

Hello,

I am using bbtools stat.sh to perform the calculation of the N50 for MAGs. However, the values I get are inverted when compared to the values I get when I deposit the same MAGs in the NCBI.

For example, for this bacteria: https://www.ncbi.nlm.nih.gov/assembly/GCF_002368295.1/

The values present in their global statistics table are:

Scaffold N50: 8,700,819 Scaffold L50: 1 Contig N50 :744,139 Contig L50: 4

When I download the same sequence and run bbtools stats.sh, I get:

$ /data/msb/tools/bbtools/bbmap/stats.sh in=GCF_002368295.1_ASM236829v1_genomic.fna A C G T N IUPAC Other GC GC_stdev 0.2965 0.2034 0.2038 0.2964 0.0001 0.0000 0.0000 0.4072 0.0106

Main genome scaffold total: 6 Main genome contig total: 25 Main genome scaffold sequence total: 9.352 MB Main genome contig sequence total: 9.351 MB 0.009% gap

Main genome scaffold N/L50: 1/8.701 MB

Main genome contig N/L50: 4/744.138 KB

Main genome scaffold N/L90: 1/8.701 MB Main genome contig N/L90: 13/209.062 KB Max scaffold length: 8.701 MB Max contig length: 2.259 MB Number of scaffolds > 50 KB: 4 % main genome in scaffolds > 50 KB: 99.08%

This indicates that the values are inverted, that is, L50 and N50 are swapped.

Do you have a different, trustworthy tool to recommend for the calculation of N50?

Thanks a lot for the attention.

Best ROdolfo

cjain7 commented 3 years ago

This may help https://github.com/lmrodriguezr/enveomics/blob/master/Scripts/FastA.N50.pl