fhcrc / seqmagick

An imagemagick-like frontend to Biopython SeqIO
http://seqmagick.readthedocs.org
GNU General Public License v3.0
113 stars 22 forks source link

Mixed error messages and real output on stdout #74

Closed fungs closed 6 years ago

fungs commented 6 years ago

Hi, lovely program! I'm using it to convert entire nucleotide alignments with gaps to protein space, there are very few programs around that would correctly treat gaps symbols here. One issue I'm having with v0.7.0 is that it writes error messages to stdout instead of stderr, for instance to notify about unknown codons which include gap symbols. This should be easy to fix. I currently have to filter the output through grep to clean it up.

seqmagick convert --upper --translate dna2protein - - < aln.fna > aln.faa 2>/dev/null

Example input in aln.fna:

>1
CTTTTTGCACGGCATGAAGAGCTCAAAGATGTCACAGATATCATACGAAGGTC
>2
CTTT---------------------------tcacagatatcatacgaaggtc
>3
CTTTTTGCACGGCATGAAGAGCTCAAAGATGTCACAGATATCATACGAAGGTC

Example output in aln.faa:

>1
LFARHEELKDVTDIIRR
Unknown Codon: T--
Unknown Codon: -TC
>2
LX--------XTDIIRR
>3
LFARHEELKDVTDIIRR

Thanks!

Best, Johannes

peterjc commented 6 years ago

It looks like it comes from this line:

https://github.com/fhcrc/seqmagick/blob/0.7.0/seqmagick/transform.py#L665

logging.warning("Unknown Codon: %s", codon)

Which in turn is probably configured here:

https://github.com/fhcrc/seqmagick/blob/0.7.0/seqmagick/scripts/cli.py#L27

    # set up logging
    logging.basicConfig(stream=sys.stdout, format=logformat, level=loglevel)

That should surely default to logging to sys.stderr rather than sys.stdout exactly because of the problem reported here (polluting standard output which should be data only).

metasoarous commented 6 years ago

Closed by #75. Thank you both, @fungs for reporting and @peterjc for submitting the fix!