Closed schristley closed 7 years ago
maybe to make it more clear, should use a different separator instead of "_", e.g. "[namespace]:column_name". That way parsers can use the ":" to separate the two easily.
I'm not sure I understand the use case of running the same tool on the same data multiple times and outputting to the same file. If the point is to test different versions of a tool, that could potentially require unique namespaces even for core/required fields (tweak the algorithm, change the primary allele assignment). So maybe it's better to assume that multiple runs will always go into different files, with a time and version stamp in the metadata?
It doesn't have to be different versions, it could be different executions of the same tool but with different parameters, maybe with the idea of comparing them together. I agree that they could be put into different output files.
I'm going to close this for now. I think we discussed a while ago allowing people to add columns totally freely, though best practice might be to prefix it. Feel free to reopen if you think we should endorse a particular way to add custom columns, and we'll add it to the agenda in 2 weeks.
A simple mechanism for tools to create additional columns can be specified. I'm thinking something like this.
It is possible that there can be a conflict of namespaces between tools, this can mostly be avoided informally, but we can consider adding a requirement that tools check the set of namespaces.
There is also the issue of running the same tool more than once on a file. In this case, it is up to the tool on whether it replaces existing column values or creates new columns for each execution. If the latter, then the namespace has to be unique across executions.