dkogan / vnlog

Process labelled tabular ASCII data using normal UNIX tools
161 stars 6 forks source link

Allow reading plain csv files #8

Closed dkrikun closed 2 months ago

dkrikun commented 3 months ago

Oftentimes, I happen to have existing CSV files with whitespace delimiter which are almost VNL but they lack a '#' character in the first line. It would be nice if vnlog tools would accept such files too (maybe provided there is some flag, e.g. --csv).

dkogan commented 3 months ago

Hi. The reason why I don't have options like this is that fully supporting csv is a lot of code I don't want to have inside vnlog, and because adding this support in the shell is trivial. Look:

(echo -n '# '; cat whatever.csv) | vnl-filter ....

Does that solve your problem?

dkrikun commented 3 months ago

That is what I do, yeah. I just thought if it is that simple to on the outside, it can't be that had to do it from within vnlog itself and save the ceremony.

dkogan commented 3 months ago

OK. It would be nice to get some flavor of csv working, but I'd need to at the very least handle quoting. Do you have any personal experience doing that with perl? Text::CSV exists, but I've never used it.

Also, in your experience is there any semi-standard way to do field labels and/or comments? I'm guessing you don't have comments at all, and the field labels are the first line; yes? Is that just you, or have you seen that in other contexts too?

dkrikun commented 3 months ago

I personally have a ton of csv files with header for labels and only simple numerical data (no quoting). Which, I would say, is what most "engineer-friendly" csv files are. I don't have experience using Text::CSV in perl, but I guess I could have a look!

dkogan commented 3 months ago

I do see labelled csv sometimes, but the labels are meant for human consumption, so the ones I've seen often have spaces in the name and are usually really sloppy:

Still, being able to parse this would be good. I'm probably not going to work on this in the near future. If you want to, look at vnl-filter and lib/Vnlog/Util.pm (to make vnl-sort and such work), and see if you can make them work with Text::CSV in some way. Let me know if you need help

dkrikun commented 2 months ago

I have been busy lately and thus instead of making a PR for this issue, I decided to follow your advice and ended up with a "shim" script to_vnl, as follows:

#!/bin/bash

echo -n '# '
cat ${1:--}

Thus allowing nice pipelines like to_vnl data.csv | vnl-filter ... or <data.csv to_vnl | vnl-filter ...