EddyRivasLab / hmmer

HMMER: biological sequence analysis using profile HMMs
http://hmmer.org
Other
305 stars 69 forks source link

hmmscan should just output tsv #314

Closed nhartwic closed 10 months ago

nhartwic commented 10 months ago

hmmscan output is needlessly annoying to parse. It is whitespace delimited, but some whitespace, particularly whitespace in "description of target" does not act as a delimiter. While proper escaping or quoting could resolve this, a practical sollution would involve just switching to tab as a separator.

For backwards compatibility, outputting tsv may need to be enabled via a flag.

cryptogenomicon commented 10 months ago

Noted.

nhartwic commented 10 months ago

To be clear, I'm referring to the output that is already tabular, available from "--tblout" or "--domtblout". Is there any reason why these tables use inconsistent field separators?

cryptogenomicon commented 10 months ago

Yes, there is. The reasons have been discussed at length already elsewhere. So has the ease of parsing these files with pandas, python, perl, awk, or whatever. I won't have anything else to say about it here, sorry; this is not the place for the discussion.

nhartwic commented 10 months ago

Fair enough. See you around.

RamRS commented 6 months ago

If you're space-delimiting output and claiming it's parser friendly while neither quoting multi-word column names nor replacing their internal spaces with hyphens or underscores, I find it hard to believe your stance "I still stubbornly feel that space-delimited files are just as easily parsed as tab-delimited files." is well thought-out.

cryptogenomicon commented 6 months ago

Noted.