Harvester-friendly with use_default_schema=True

wardi commented 9 years ago

Store all alternate languages in a single extra so that datasets harvested with use_default_schema won't include JSON blobs in most fields. Try to align with EC work, yet to be released. This might let us have our nice, all-languages-are-equals API while still being compatible with the existing harvesting code that passes use_default_schema=True (although all "alternate" languages will be demoted into an extra)

wardi commented 9 years ago

@amercader how do you feel about this approach?

amercader commented 9 years ago

I think this goes in the right way. How would this translations field look?

Something like (ie grouped by language)

{
  "en": {
    "title": "Title in English",
    "notes": " Desc in English",
  },
  "fr": {
    "title": "Titre en Français",
    "notes": " Desc en Français",
  }

}

Or like this (grouped by field):

{
  "title": {
    "en": "Title in English",
    "fr": "Titre en Français"
    "
  },
  "notes": {
    "en": " Desc in English",
    "fr": " Desc en Français"
  }

}

The second seems easier to implement given how you store the translations. I'm not sure how non-root fields would work though (eg tag[x]['name'] or resource[x]['description'] )

I wonder if it also makes sense support for a language parameter on the package_show call to retrieve just a specific one.

wardi commented 9 years ago

Yes, I'd lean toward the latter and just mirror the structure of the dataset returned so it's always obvious what's intended, e.g. package_show with use_default_schema=False:


{
  "title": {
    "en": "Title in English",
    "fr": "Titre en Français"
  },
  "keywords": {
    "en": ["sample", "demo"],
    "fr": ["example", "demo"]
  },
  "resources": [
    {
      "name": {
        "en": "file one",
        "fr": "fiche un",
      },
      "...": "..."
    }
  ],
  "...": "..."
}

And with use_default_schema=True the package returned assuming the site default language is 'en' would be:

{
  "title": "Title in English",
  "resources": [
    {
      "name": "file one",
      "...": "..."
    }
  ],
  "extras": [
    {"name": "keywords", "value": "sample, demo"}
  ],
  "...": "..."
}

And the extras would include a "fluent_combined" extra that is a json-encoded string that looks like the first example, but excludes all the non-fluent fields. Note also how I converted the fluent tag list "keywords" extra to a simple comma-separated string here.

Adding the ability to request a single language would be great. That would mean building some kind of multilingual text support into core, and making sure it doesn't break existing sites.

wardi commented 9 years ago

Did #20 instead

ckan / ckanext-fluent

Harvester-friendly with use_default_schema=True #9