BioInf-Wuerzburg / SeqFilter

Versatile FASTA/FASTQ sequence file analysis and modification tool
MIT License
9 stars 6 forks source link

Parameter `--ids -` for an empty set returns all sequences #11

Closed greatfireball closed 5 years ago

greatfireball commented 5 years ago

Description I want to use the option --ids inside a pipeline. In case the input steps generate no output, SeqFilter will return the complete sequence set instead of an empty file.

Version 2.1.8

Expected behavior Returning an empty sequence file

Example

cat >seq.fa <<EOF
>A
ACG
>B
HIJ
>C
KLM
EOF

# If IDs are provided via pipeline, output is as expected
grep "^>[AB]" seq.fa | SeqFilter/bin/SeqFilter --ids - seq.fa 
# [17:24:14] SeqFilter/bin/SeqFilter-2.1.8
# [17:24:14] Detected FASTA format
# [17:24:14] --ids: STDIN
# [17:24:14] --in: seq.fa
# #source   state   reads   bases   max min N50 N90
# seq.fa    RAW 3   9   3   3   3   3
# seq.fa    FIL 2   6   3   3   3   3

# If no IDs are provided via pipeline, output contains complete sequence set
grep "^>[D]" seq.fa | SeqFilter/bin/SeqFilter --ids - seq.fa 
# [17:26:09] SeqFilter/bin/SeqFilter-2.1.8
# [17:26:09] Detected FASTA format
# [17:26:09] --ids: STDIN
# [17:26:09] --in: seq.fa
# #source   state   reads   bases   max min N50 N90
# seq.fa    RAW 3   9   3   3   3   3
# seq.fa    FIL 3   9   3   3   3   3
thackl commented 5 years ago

What would be the best behaviour in case of empty --ids: