spec for tools adding new columns

airr-community / airr-formats

PLEASE SEE airr-standards FOR FURTHER DEVELOPMENT: https://github.com/airr-community/airr-standards

MIT License

1 stars 2 forks source link

spec for tools adding new columns #15

Closed schristley closed 7 years ago

schristley commented 7 years ago

A simple mechanism for tools to create additional columns can be specified. I'm thinking something like this.

A tool specifies a namespace identifier in the metadata as a prefix for the columns it adds.
A tool (optionally) provides the list of columns added in the metadata.
The column names become "[namespace]_name"

It is possible that there can be a conflict of namespaces between tools, this can mostly be avoided informally, but we can consider adding a requirement that tools check the set of namespaces.

There is also the issue of running the same tool more than once on a file. In this case, it is up to the tool on whether it replaces existing column values or creates new columns for each execution. If the latter, then the namespace has to be unique across executions.

schristley commented 7 years ago

maybe to make it more clear, should use a different separator instead of "_", e.g. "[namespace]:column_name". That way parsers can use the ":" to separate the two easily.

scharch commented 7 years ago

I'm not sure I understand the use case of running the same tool on the same data multiple times and outputting to the same file. If the point is to test different versions of a tool, that could potentially require unique namespaces even for core/required fields (tweak the algorithm, change the primary allele assignment). So maybe it's better to assume that multiple runs will always go into different files, with a time and version stamp in the metadata?

schristley commented 7 years ago

It doesn't have to be different versions, it could be different executions of the same tool but with different parameters, maybe with the idea of comparing them together. I agree that they could be put into different output files.

laserson commented 7 years ago

I'm going to close this for now. I think we discussed a while ago allowing people to add columns totally freely, though best practice might be to prefix it. Feel free to reopen if you think we should endorse a particular way to add custom columns, and we'll add it to the agenda in 2 weeks.