cldf / csvw

CSV on the web
Apache License 2.0
37 stars 6 forks source link

CSVW conformance #60

Closed xrotwang closed 2 years ago

xrotwang commented 2 years ago

This PR is largely backwards compatible. The most notable change is a default Dialect for the objects in csvw.metadata with commentPrefix=None. The CSVW spec seems to be ambiguous - mentioning both "#" and null as default at different places. What pushed me to go for null was the big number of ToJson conformance tests for number formatting, which all used number patterns as column headers like #,##0.0#. With the old default, none of these tests would pass.

codecov-commenter commented 2 years ago

Codecov Report

Merging #60 (0135a16) into master (03ec862) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##            master       #60    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           16        22     +6     
  Lines         2540      3514   +974     
==========================================
+ Hits          2540      3514   +974     
Impacted Files Coverage Δ
src/csvw/__init__.py 100.00% <ø> (ø)
src/csvw/__main__.py 100.00% <100.00%> (ø)
src/csvw/datatypes.py 100.00% <100.00%> (ø)
src/csvw/db.py 100.00% <100.00%> (ø)
src/csvw/dsv.py 100.00% <100.00%> (ø)
src/csvw/dsv_dialects.py 100.00% <100.00%> (ø)
src/csvw/jsonld.py 100.00% <100.00%> (ø)
src/csvw/metadata.py 100.00% <100.00%> (ø)
src/csvw/utils.py 100.00% <100.00%> (ø)
tests/conftest.py 100.00% <100.00%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 03ec862...0135a16. Read the comment docs.

xrotwang commented 2 years ago

So while this is mostly backwards compatible, the sheer volume of additions would suggest calling this "csvw 3.0" - which is a bit funny because we only had 2.0.0 in the 2.x line, and only for a rather short time. Anyway, I think "3.0" would be the right version here - agreed?

xrotwang commented 2 years ago

Looking at the JSON created by csvw2json, this really vindicates the design of CLDF. E.g. this is the first example given in the Leipzig Glossing Rules:

                        {
                            "http://cldf.clld.org/v1.0/terms.rdf#id": "1",
                            "http://cldf.clld.org/v1.0/terms.rdf#languageReference": "indo1316",
                            "http://cldf.clld.org/v1.0/terms.rdf#primaryText": "Mereka di Jakarta sekarang.",
                            "http://cldf.clld.org/v1.0/terms.rdf#analyzedWord": [
                                "Mereka",
                                "di",
                                "Jakarta",
                                "sekarang."
                            ],
                            "http://cldf.clld.org/v1.0/terms.rdf#gloss": [
                                "They",
                                "in",
                                "Jakarta",
                                "now"
                            ],
                            "http://cldf.clld.org/v1.0/terms.rdf#translatedText": "They are in Jakarta now.",
                            "http://cldf.clld.org/v1.0/terms.rdf#metaLanguageReference": "stan1293",
                            "http://cldf.clld.org/v1.0/terms.rdf#source": [
                                "Sneddon1996[237]"
                            ]
                        }

It can be parsed without knowing anything about CSV dialects or the local table of column names.

xrotwang commented 2 years ago

What's somewhat missing from the JSON is the sources. But the bibtex file is linked

{
    "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#Generic",
    "@type": "http://www.w3.org/ns/dcat#Distribution",
    "dc:source": "sources.bib",
    ...

and the references are easy to parse:

    "http://cldf.clld.org/v1.0/terms.rdf#source": [
       "Sneddon1996[237]"
    ]
LinguList commented 2 years ago

Looking forward to testing this actively. So far, from reading your comments, this looks very nice.

chrzyki commented 2 years ago

Testing this now. Thanks for all the work! One small thing: setup.py needs requests-mock as testing (?) dependency.