ampinzonv / BB2

BioBash UN official repository
Other
3 stars 1 forks source link

More statistics for bb_get_fasta_info #10

Open ampinzonv opened 11 months ago

ampinzonv commented 11 months ago

CONTEXT bb_get_fasta_info is used to obtain info from a FASTA file (multiple or single).

ISSUE Nevertheless it does not outputs critical info from a FASTA file, such as G/C content or sequence size.

SOLUTION bb_get_fasta_info uses Seq::get_fasta_components which basically extracts different parts from a FASTA file (and which in turn uses seqkit "fx2tab" function) but does not performs any statistics over a file. So bb_get_fasta_info is kind of misleading because it shows different parts from a FASTA file but does not perform any particular counting or statistical operation over a FASTA file. So at least bb_get_fasta_info should also display: 1) G/C content 2) Sequence length

For this we should create a new function, something like:

Seq::get_fasta_statistics

And then use ALSO this one in bb_get_fasta info.

Other quick and dirty solution is to use the "-s" option only and then pipe to "wc -m" and get the result.