Open Llammissar opened 7 years ago
The specific example you listed can be handled pretty easily using awk
. More generally though, none of the tsv-utilities tools provides a way to generate new values with mathematical expressions, or in the case of tsv-filter
, the ability to filter rows based on a mathematical formula. The general reason I haven't introduced such facilities is that I haven't found a way to do this without essentially recreating significant parts of awk
. awk
is a great tool, there's no reason to rebuild it. I have a couple ideas on this front, but nothing I'm likely to get to soon.
Embarrassingly, I've barely used awk for anything, and simply didn't think to reach for it. I spent a few minutes playing and the result was substantially better, so I can see your point. Didn't really help the column naming, but that's a smaller issue.
In the long term, I think there are things that would be nice to have in this general category, but agree that it's not an especially high priority.
One of the reasons I wrote these tools is that awk, powerful as it is, becomes error prone to write as soon as things get at all complicated. And, it is possible to get a large portion of the power by providing a set of much simpler and less error prone primitives. An important part of the intent of these tools.
But, for general mathematical manipulation, it seems hard to create a language that is significantly simpler than awk, or at least that's my feeling for now.
Hey, been a while! Just hit another thing that'd be useful for my work, so I figured I'd put it here.
Consider the following file sample.tsv:
If we want to know the total number of actions for each script, each second column value needs to be multiplied by the third for each line. Right now, I can get this far:
Which will give me:
Hm. Not quite. Better still, the rows match, but we can't join this with the current tools. We can sort of fake it with some awful circumlocutions involving number-lines:
gives me:
Well, it's better. The new column is named wrong, though. Dang.
More broadly, that's only multiplication. What I'd like to see is something like... this isn't really tsv-summarize, which is columnar reductions. Maybe it's an advanced selection?
tsv-select -H -f1,2,3 --multiply 2:3:total_actions
Or maybe it's a new tool;tsv-eval
or something. Hm. :/