gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
73.8k stars 7.41k forks source link

Add support for *.tsv and *.csv files inside /data #8853

Open JeremyRubin opened 2 years ago

JeremyRubin commented 2 years ago
$ hugo version
hugo v0.84.0+extended

I think this is a bit more of a "feature" than it is a bug... but the absence of this feature doesn't match (this) user's expectations that hugo supports CSVs. CSVs are not currently supported in the /data folder, even though hugo has CSV support through getCSV... Currently only YAML, TOML, JSON are supported, but it seems silly to not autoload a format we already have the code to parse.

To fix this, I think we should load CSV and TSV files from the data directory. Tabular data is particularly compelling for things like schedules or items where all attributes should be the same across entries.

Does this issue reproduce with the latest release?

unsure

jmooring commented 2 years ago

An easy workaround for this limitation is a tool called miller.

other/foo.csv

"field_1","field_2"
"foo","bar"
"wibble","wubble"
mlr --c2j --jlistwrap cat other/foo.csv > data/foo.json

data/foo.json

[
  {
    "field_1": "foo",
    "field_2": "bar"
  },
  {
    "field_1": "wibble",
    "field_2": "wubble"
  }
]

And you can do the same with TSV...

mlr --t2j --jlistwrap cat other/foo.tsv > data/foo.json
JeremyRubin commented 2 years ago

@jmooring i'm ok on a workaround, it's easy enough to do with a number of tools. Thanks for the suggestion though :)

it's convenient to keep files (e.g. for git) in the format that makes most sense for editing them, and I think CSV should be natively supported in the data dir anyways.

divinerites commented 2 years ago

An other workaround can be found here (keep csv file format) : https://discourse.gohugo.io/t/error-message-failed-to-read-data-from-csv-even-it-is-working-fine/10763/7

jmooring commented 2 years ago

Use case:

{{ $author := where site.Data.authors "id" "foo" }}

With CSV as the data source, we either need to:

a) Provide a mechanism to specify unmarshal options (delimiter, comment), OR b) Only support CSV files with comma delimiters and without comments

In either case, we need to handle header rows. See #8859.

bep commented 2 years ago

I have put this issue back into the Proposal state, as I'm not sure it's worth it, considering the ambiguities.

I have a working branch (that I digressed away from with another working branch) that more or less makes Hugo much more memory efficient -- and it makes transform.Unmarshal the very best option for short lived data (the problem today is that we cache entries in transform.Unmarshal and never throw it away -- meaning you cannot effectively just load a 100MB JSON for every page).

JeremyRubin commented 2 years ago

One "hotfix" option would be to just ignore csv/tsvs placed in the data directory? This is at least a bit better than throwing an error, and then they can be accessed with getCSV... it'd at least be idiomatic w.r.t. where to store the data