Closed fungs closed 5 years ago
@pbelmann, I'll pass this to you to review since I am unfamiliar with the original binning or profiling formats.
A minor observation to the profiling multi-sample format: The requirement to have an empty line separating the samples does not help for parsing and could be excluded, because an empty line can occur anywhere in the file (according to the Profiling Output Format specs.).
@fernandomeyer: this is true, but it does not hurt either. The idea is to make the format as readable as possible for the human eye, and an empty line to separate to otherwise separate files is a good separator which makes things look much cleaner. In general, we usually start with strict definitions and loosen them if necessary in the next minor versions. This direction is preferable because parsers which comply with the strict definition will continue to work with the updated minor versions (but not the other way around).
Please merge this PR.
As requested by @alicemchardy and some CAMI users, I have extended the specs to allow for simple file concatenation. This is a backwards-incompatible change, which means that we have new major versions of the two specs. The specs impose restrictions on the concatenated content (same taxonomy version, column layout and unambiguous sample ids) which could be loosened in upcoming minor version bumps, if reasonable. The restrictions make parsing and interpretation easier.