Open HerbCSO opened 8 years ago
I'd support an environment variable to set this.
One trick you may not be aware of (and it may not work for your use case), but if your csv files have a .tsv
extension, then xsv
should use a tab delimiter automatically.
Oh, that's cool, I was indeed not aware of that. Unfortunately a lot of files I work with don't have a .tsv
extension for... reasons...
But an environment variable for that would be totally awesome!
Maybe environment variables based on awk variable names would be a good idea:
XSV_FS = input field separator (input field delimiter)
XSV_OFS = output field separator r (output field delimiter)
An environment variable for specifying additional file extensions for automatically interpreting files as TAB-separated files, would be useful too. A lot of bioinformatic related file formats are TAB-separated file formats.
XSV_TSV='bed|gtf|gff|tsv|vcf'
How about defaulting to automatic separator detection? Separator is either ',' or '\t', whichever comes first.
@ilabdsf Won't work because escaping/quoting permits either of those characters to be present before the first field separator.
@BurntSushi right, that is why I say "default to". If someone wants a robust script, he can write "-d,". But when just using xsv from the command line, I think it is acceptable. It is very unlikely to have have tabs or commas in column names.
A bash function can be setup to always use a specific delimiter:
function xsvt() {
local cmd=$1
shift && command xsv $cmd -d"\t" $@
}
I vote for an environmental variable that can set the default input/output for all of the pipes maybe just something like
XSV_DEFAULT_DELIMITER
@iliekturtles The bash function should be a a lot more complex than that.
Some things that don't work:
$ xsvt -h
Unknown flag: '-d'
Usage:
xsv <command> [<args>...]
xsv [options]
# No output (instead of help):
$ xsvt
# Overriding -d does not work:
$ printf '1\t2\n' | xsvt input -d '\t'
Invalid arguments.
Usage:
xsv input [options] [<input>]
Not sure if you take pull requests or not, but I created one for my proposal. I have not written much rust code, but I thought that this seemed easy enough things to dive and and learn some. I was thinking about trying to document the variable somewhere, but was not really sure where. Do you want me to put it into all of the help strings?
@BurntSushi can you take a look at #94 ?
alias tsv='xsv -d "\t"'
The main issue for me is that the output delimiter cannot always be configured. If I get some TSV files (with embedded commas), and do something like xsv cat
, then since there is no --out-delimiter
flag, the files get corrupted.
So to make everything composable, it would be great if an env delimiter influences both input and output.
@nickray You can add xsv fmt -t '\t'
to the end of your pipeline.
I didn't realize that xsv cat rows
(and probably others) inserts double quotes, I would have expected both.csv
to be broken in the following (my mental model of CSV is something like line.split(",")
...):
printf "a\tb\nx,y\tz\n" > file1.tsv
printf "a\tb\nx\ty,z\n" > file2.tsv
echo ":: both.csv"
xsv cat rows -d '\t' file?.tsv > both.csv
cat both.csv
echo ":: both.tsv"
xsv fmt -t '\t' both.csv > both.tsv
cat both.tsv
Good to know it's not!
:: both.csv
a,b
"x,y",z
x,"y,z"
:: both.tsv
a b
x,y z
x y,z
Would still be helpful to avoid the quotation dance, and use tabs (or 0x1f) throughout by just doing something like export XSV_DELIMITER='\t'; xsv cat rows file?.tsv > both.tsv
, particularly when chaining xsv
calls (or, for instance, xsv partition
would seem to need an xargs
or parallel
call to end up with TSV splits) . Maybe my Rust skills are soon good enough to contribute soon :seedling:
Any updates?
OK, can I just say first of all that I'm IN LOVE with this toolkit!? It is an absolute joy and it fills a gaping void for me! ;]
Anyway, I work with a lot of TSV files (tab-separated) and it's kind of annoying to have to type
-d "\t"
for every single command I run (but thank you for providing the option!). I'd love to be able to change the default separator from,
to\t
.Unfortunately it's not easy to alias since the
-d
must come AFTER the command. I suppose I could do something likexsvtab() { xsv $1 -d "\t" ${@:2}; }
(and maybe evenalias xsv=xsvtab
if I'm feeling really lazy - but then I lose the ability to override the delimiter if I actually do have a different type of file and I have to run it with\xsv
to use the original command instead of the alias (specifying-d
again with the above alias results in anInvalid arguments.
error because now-d
is specified twice)). That all feels a little clunky, although it does kind of work for my use case, it would just be slicker to be able to override the default.