Swirrl / csvw.org

Static site generator for csvw.org
https://csvw.org
Other
2 stars 1 forks source link

Improve rationale for CSVW #2

Open RickMoynihan opened 2 years ago

RickMoynihan commented 2 years ago

One thing I think the site would benefit from is improving the rationale for CSVW a bit more.

The page here is a good start and provides a brief summary of some problems with CSV:

https://swirrl.github.io/csvw.org/guides/why-use-csvw.html

However, crudely summarising (to highlight the issue) the argument for CSVW as presented reads a bit like this:

  1. CSV has a bunch of problems (many dialects, only one datatype etc, parsing issues etc...
  2. but it's open so yay can have 3 star data...
  3. but you really want 5 stars, so you need CSVW!

i.e. we jump straight into the 5 star model and don't describe how CSVW solves any of the stated problems with CSV. I think for most users this is the low hanging fruit. Linking with identifiers, and connecting over the web are definitely benefits, but it would be good to expand on fixing the problems EVERYONE has with CSV first :-)

I think this can largely be solved by riffing on the headings we have on the front page:

Screenshot 2021-10-26 at 10 50 19

before we get into the linked data story. Indeed it might be worth de-emphasising the linked data bits, or separating them out from the low hanging fruit.

Robsteranium commented 2 years ago

You're right it does jump into linked-data. The "solution" is only presented in the second guide on "how to make CSVW". We should add a paragraph to explain how CSVW dialect and datatypes help with parsing before going on to talk about linked data. Indeed it might make sense to defer that to another guide...

Robsteranium commented 2 years ago

I've had a pop at this. Wdyt?

RickMoynihan commented 2 years ago

Looks good :-)

I do wonder if we should structure the guides in a way inspired by this...

The first 3 steps are patching up CSV flaws:

  1. Add a minimal metadata file
  2. Add a dialect (if you need too; i.e. not UTF-8 & RFC compliant). Note because of the defaults even just step 1 is an improvement over just publishing an RFC UTF-8 complaint CSV; as otherwise there is no way to know how to interpret the CSV.
  3. Add basic datatypes
  4. Add identifiers...
  5. Validation...
  6. Vocabularies etc...
RickMoynihan commented 2 years ago

i.e. organise around the low hanging fruit improvements we can make first; that require low effort.

Robsteranium commented 2 years ago

The how guide already covers 1-4.

I agree it'd be good to do one on validation and another on vocabularies. I was also wondering about one for structuring the csv in the first place.

RickMoynihan commented 2 years ago

I do really like that example; but I don't think it makes the points I'm wanting to emphasise as low hanging fruit. I think it starts at 3 and 4, and picks off the mid-level fruit :-)

It says nothing of the dialect at all.

I think the dialect might be worthy of a separate example; explaining it; the problem, and perhaps resolution (discovering metadata file/csv).

I think it does an excellent introduction to covering 3 and 4; but perhaps we could use another meta-article / page that was about how to leverage CSVW in practice?

That meta-article could essentially then describe the 6 things in my list that CSVW helps with, and link off to examples that discuss / demo unlocking that value. i.e. one on the dialect; one on datatypes (your gritbins example would be fine for that), another on identifiers / vocabularies / table groups / validation etc.

Anyway If I can find time, I'd be happy to take a stab at this; particularly after having cleared up the bugs in the CSVW spec https://github.com/w3c/csvw/issues/881.

Robsteranium commented 2 years ago

I like the grit bins example because it's short but still takes you from the simple level of labelling columns to the advanced level of adding identifiers. The brevity and simplicity makes it a good introduction for technical and strategic audiences alike. Skipping some low-hanging fruit is desirable as it helps us to cover more ground - quickly showing the extent of the orchard!

I think we risk putting people off if we wade straight in to the technical details of dialect. I suspect this won't affect or motivate 90% of people who aren't software engineers and it's not particularly inspiring for those who are! It might be low-hanging fruit in terms of CSVW as a whole and it's early in the sequence of execution but I don't think this is reason to prioritise it in the cognitive funnel. That said, it doesn't hurt to have a specific deep-dive guide into the topic that can be linked from the other guides.

I think we may find that the guides index page starts to look like the meta article you mention. I suggest we gather-up some more content and then come back to edit that overview.