UNMigration / HTCDS

Human Trafficking Case Data Standard
Other
19 stars 6 forks source link

About the Human Trafficking Case Data Standard (HTCDS) with HXL (The Humanitarian Exchange Language) hashtags #7

Open fititnt opened 3 years ago

fititnt commented 3 years ago

Hi.

I'm Emerson Rocha, here as one of the members of the @HXL-CPLP, an Community User group of HXL with special focus on CPLP. (I'm also a member of some local groups of Amnesty).

While looking for APIs and Schemas for the project (https://github.com/HXL-CPLP/Auxilium-Humanitarium-API) I just found this standard, so my interest here is consider potential promotion both for HXL in general and at least in our community!

Existing HXL hashtags (and the existing need for conventions on areas like the HTCDS)

In fact, do exist well documented HXL hashtags already used in production on the HDX site, https://data.humdata.org/, (so, HXL standard is already used on humanitarian area). But while HXL Standard is flexible, it does lack documentation related to more sensitive (or at least, dis-aggregated data, like per individual). A TL;DR is that most HXLated datasets are (not surprisingly) data that already is public.

Also, in general, discussion of sensitive data and it's tools are a taboo. Even in English. And people are dying for it.

So, on this very first hello from me, I believe that while this standard do not (as expected) document more broader aspects of sensitive data, definitely worth the effort to offer also an HXLAted version of the toolkit! And then, the result could be used by CPLP.

I (and if need, ask for a review from other users of the HXL international community) I'm interested in helping with this!

New HXL hashtags

On very simplified terms, an HXLAted addition to the current Kit would be an spreadsheet with base columns with some base #hashtag to represent an human and and +attributes (that could be similar to what exists using English names.

Then, at least one Spreadsheet that explains that each one is.

Then examples with fake data, so tools could be used.

Then, tools, example dashboards, etc. On this point do exist a lot of HXL tools, some like the https://HXLDash.com to help with data visualization.

My contact email is Rocha(at)ieee.org. if necessary we could talk more by email or slack or other channels!


PS 1: the current project, HTCDS does not have an typical well know software/database-like license (like https://spdx.org/licenses/), so this means that would require a lawyer to understand if it can be used or not. Also, the "Terms of use" (that seems to be based on some site, not a license for a standard) may also be perceived as conflicting with the also mentioned https://open-stand.org/about-us/principles/. These points are very pertinent on context of get help to use with HXL because HXL is about terms to express meaning on tabular data, if really enforced the all Users must not: use HTCDS or the information therein contained for any purpose different to the purpose of HTCDS as defined in Section 1 this means that because HXL lack of conventions to express data at individual level (aka a human, not a group of human), the reference in HXL for the HTCDS could deny other UN agencies, Red Cross, Amnesty, etc to use HXL and tools that could be aware of these data because they would use for things not related to human traffic. If at least the column names and short English description is released under public domain (aka, not 'try to enforce patent on #x_person+first+name') this make things simpler.

PS 2: we from HXL-CPLP, for example, would likely to release a version with liked data to express the concepts of the spreadsheet, like for example link what is in English "Gender" wikidata Q48277, so this could assist automated processing. If even tools to undestand how to process (or how to export to) the HTCDS, we would kindly ask some other type of coding that would be patent free and not use "Gender", "Nationality", "Title", etc, because the "13. Termination, Denying Access" could at any time break software or tools. Like I said, the people who consume data do not have lawyers, if we have to attach some license, they would not use the HTCDS. But if UNMigration do already is using some convention, from our side, even if we would need to create conversion tools just to deal with license issues, we do that.

fititnt commented 3 years ago

Hi, can the license of The Human Trafficking Case Data Standard (HTCDS) be clarified as soon as possible?

English, even when well written, tend to be too ambiguous to translate. And even the best translations workflow in the world face challanges, but it was worse in the past when much less restrict previous documental control before comming to translators was done.

I'm not saying that the HTCDS v0.2-alpha needs could need at least basic awareness of, just for example, how source terms like "First Name" are very for hard to local implementers. And I'm not even telling eastern cultures (that would swap Given Name with Surname), but even Portuguese and Spanish localizers are scared that "First Name/Second Name" (these are the only reference for person name on v0.2-alpha) even when looking at referential terminology on their languages would means that peoples names would not include the full person name. I could (in name of several others) go hours here, trust me.

Why license clarification is important

Some early people willing to localize to other languages are waiting to do only after at least some additional initial non-English version (even if is some language they're not proficient at all). They (in special if they're professional translators) expect very fast response (hours, not months), or they get demotivated very fast and give very brutal feedback. The minimum we would hear is that if even (the person) as expert translator is having trouble understanding a data field, how will any average person understand what the field means?

If the HTCDS license at least get very clear (like be public domain plus "don't use our logos or tell you're us" is the perfect one) this means pro bono experts in translation (or even terminologis) that really are anxious for this type of thing be released to the public be willing to do, but they cannot do from English. And the work necessary to solve ambiguity on any non-English is likely to be hours more complex per term than was copy and paste Salesforce fields.

The HTCDS team should decide if you want HTCDS to be useful or not. And the current restrictive license is both not allowing localization, but even when final version of HTCDS gets released, is likely that even if you ask "official" translations from UN translators, they would have hard time understanding what each field means to a point of forever be just a English untranslatable reference standard for something that is very necessary.

VerenaSattler commented 3 years ago

Hi Emerson,

Many thanks for your comments and feedback. This is very useful!

With regards to the license, we are currently working on having a better license with our legal colleagues.

We are definitely interested in your points raised above of having a HXLAted version of the toolkit! What would we need to do to take this forward? Thank you.

fititnt commented 3 years ago

We are definitely interested in your points raised above of having a HXLAted version of the toolkit! What would we need to do to take this forward? Thank you.

To put it very briefly, the trend of the final result would be an Excel spreadsheet (or even a Google Sheets template) that has a text header (can be in English) and the second row HXL tags. This is one example from https://vocabulary.unocha.org/:

About the HXLated version

But the decision of HXL attributes, since this would be reused by other organizations, would need a lot of thinking more people already in the HXL community. Since the HTCDS very likely would be the first one that will deal with personal data that would be published, the mere risk of do such an huge effort to create and document an HXLated version of HTCDS to then, risk the generic fields not be usable for non HTCDS projects actually is against the interest even from of other UN agencies, Red Cross, Amnesty etc.

The "13. Termination, Denying Access" and translations

The "13. Termination, Denying Access" mentioned 109 days ago, in particular in face of recent events that is impacting a lot of lost trust in the interpreter/translators people related to humanitarian areas, is the type of thing that make even worse the perception against English speaking community (or anyone who uses English as working language) for volunteers trying to translate for whatever would be local languages. Most decision makers don't speak English and who speaks may not have time to respond.