I am using bbtools stat.sh to perform the calculation of the N50 for MAGs.
However, the values I get are inverted when compared to the values I get when I deposit the same MAGs in the NCBI.
When I download the same sequence and run bbtools stats.sh, I get:
$ /data/msb/tools/bbtools/bbmap/stats.sh in=GCF_002368295.1_ASM236829v1_genomic.fna
A C G T N IUPAC Other GC GC_stdev
0.2965 0.2034 0.2038 0.2964 0.0001 0.0000 0.0000 0.4072 0.0106
Main genome scaffold total: 6
Main genome contig total: 25
Main genome scaffold sequence total: 9.352 MB
Main genome contig sequence total: 9.351 MB 0.009% gap
Main genome scaffold N/L50: 1/8.701 MB
Main genome contig N/L50: 4/744.138 KB
Main genome scaffold N/L90: 1/8.701 MB
Main genome contig N/L90: 13/209.062 KB
Max scaffold length: 8.701 MB
Max contig length: 2.259 MB
Number of scaffolds > 50 KB: 4
% main genome in scaffolds > 50 KB: 99.08%
This indicates that the values are inverted, that is, L50 and N50 are swapped.
Do you have a different, trustworthy tool to recommend for the calculation of N50?
Hello,
I am using bbtools stat.sh to perform the calculation of the N50 for MAGs. However, the values I get are inverted when compared to the values I get when I deposit the same MAGs in the NCBI.
For example, for this bacteria: https://www.ncbi.nlm.nih.gov/assembly/GCF_002368295.1/
The values present in their global statistics table are:
Scaffold N50: 8,700,819 Scaffold L50: 1 Contig N50 :744,139 Contig L50: 4
When I download the same sequence and run bbtools stats.sh, I get:
$ /data/msb/tools/bbtools/bbmap/stats.sh in=GCF_002368295.1_ASM236829v1_genomic.fna A C G T N IUPAC Other GC GC_stdev 0.2965 0.2034 0.2038 0.2964 0.0001 0.0000 0.0000 0.4072 0.0106
Main genome scaffold total: 6 Main genome contig total: 25 Main genome scaffold sequence total: 9.352 MB Main genome contig sequence total: 9.351 MB 0.009% gap
Main genome scaffold N/L50: 1/8.701 MB
Main genome contig N/L50: 4/744.138 KB
Main genome scaffold N/L90: 1/8.701 MB Main genome contig N/L90: 13/209.062 KB Max scaffold length: 8.701 MB Max contig length: 2.259 MB Number of scaffolds > 50 KB: 4 % main genome in scaffolds > 50 KB: 99.08%
This indicates that the values are inverted, that is, L50 and N50 are swapped.
Do you have a different, trustworthy tool to recommend for the calculation of N50?
Thanks a lot for the attention.
Best ROdolfo