OpenDevelopmentMekong / wp-odm_solr

Wordpress plugin used for automatically index created/updated contents into a solr index
GNU General Public License v3.0
0 stars 2 forks source link

Why did we present the published CKAN records based on the geographical area/country instead of CKAN organizations? #131

Open ChandaraKhun opened 7 years ago

ChandaraKhun commented 7 years ago

Currently, the published CKAN records presented on each OD country website, for example ODC, are based on geographical area of the records (laws, agreements, dataset/map, and library publications). All CKAN resources with spatial range of Cambodia show up on ODC website, meaning that it is not necessary that those records were published by ODC editors (who currently having a limit permission for its own organization). While CKAN admins with supper access level may set the published records under any OD country organization/website (through the organization setting of the records), other OD country platforms can still publish records of cross-boundary issues/inter-national coverage for its partner's website; meaning that OD country partner team can define spatial range of CKAN resource metadata freely, leading its presentation on other OD partner’s website.

Example: Dataset on Vegetation Health Index, Deltas at Risk & Regional land cover (uploaded and published by OD Mekong). Those datasets were set under OD Mekong organization, and with spatial range of all the Mekong countries, so it show up on all search results of all OD websites/paltforms including ODC.

As consequences, neither ODC nor other OD country is responsible in such a case. In other words, the editorial issues (metadata & bi-lingual content) for thousands or hundred thousands of the records do need to be fixed again and again.

acorbi commented 7 years ago

HI @ChandaraKhun thanks for sharing your thoughts with the rest of us in this github Issue. I did already know about your concerns around this topic but I believe others don't. In any case, this is a good way of letting others know and providing a place to express their opinion and participate in the discussion.

I already shared my point of view with you, but will elaborate again here, for the record.

In my understanding, ODI is a collaborative initiative where local organizations/groups (one or more) contribute to gathering and exposing relevant data to the users of a certain country site. The regional team ODM is supposed to support on the process and coordinate the collaborative efforts so other country partners can re-use the work effectively, among other tasks.

Since the country sites are providing information (data + editorial content) related to the issues from a certain country, it makes complete sense to me to attribute the contents exposed on its corresponding site by considering the geographical range metadata field (country).

However, I understand that this can lead to those scenarios that you are raising:

other OD country platforms can still publish records of cross-boundary issues/inter-national coverage for its partner's website; meaning that OD country partner team can define spatial range of CKAN resource metadata freely, leading its presentation on other OD partner’s website.

Correct.

As consequences, neither ODC nor other OD country is responsible in such a case. In other words, the editorial issues (metadata & bi-lingual content) for thousands or hundred thousands of the records do need to be fixed again and again.

And you are actually right while raising these possibilities, which are already happening (the example that you indicated is good). My understanding is that the editorial/data entry processes should be approached in a collaborative manner where regional team works together with local team to publish some new information (or country team with country team) which benefits both platforms.

If this is not the case currently, I believe the first step is to find agreement on the operational level, bringing everyone to the same page and looking for a solution together. In case the agreement is to change the implementation and processes to what you are proposing (which could involve a big effort but should be technically possible), I would be happy to discuss about it.

In any case, I think we would need first the other's input here to move the conversation forward. @Huyeng @gimmemochi @thetaung2 @DBishton @prustar @jpeizer1 please provide your thoughts, this is something that should be raised on the SCM, in my opinion.

jpeizer1 commented 7 years ago

@acorbi @dbishton I think its rather easy to get caught in the weeds of all the implications of this so perhaps laying out the real problems this causes and that we are trying to address would be the best way to come to a solution. I see a few issues:

1) Who is responsible for local translation of English language records entered by either a country site or ODM but geotagged to another and what is done with the English language record appearing on that geotagged site in the interim?

2) What are the real liabilities / downsides of one country or ODM adding a record geotagged for another we want to address?

3) Can the issues be addressed by: -- Allowing a geo tagged country the ability to review the record entered by another entity before publication? (is that feasible with records enetered by ODM? By other countries?) -- Can the problems be partially resolved by allowing filtering out of records geotagged by one country for another in the geotagged country's search functions.

5) Should an English language record geotagged for multiple sites require/have special consideration across all sites as the only lingua franca on all sites ?

Huyeng commented 7 years ago

@acorbi, please correct me if I'm wrong. Regarding to the local translation, on CKAN, each team of the organization can't edit any datasets of other organization because privilege. It means that ODC team can't translate any records published on ODM organization or other organizations unless their access are allowed. However, i don't think it is good idea to allow everyone can access every organization to edit something for translation.

I also concern about inconsistency between English and localization of the content that not published by ODC, but translated. For example, ODC team helps to translate an record that ODM published, later, the record has been updated without informing for translation, then the inconsistent information between English and Khmer will be happened.

jpeizer1 commented 7 years ago

@acorbi @huyeng @dbishton It appears we have one set of issues regarding input and another set regarding search and exposure of the records.

On input: we have somewhat of a stark choice between coordinating content extremely closely between all sites related to any English translation -- something I wonder is really possible with competing deadlines and priorities to publish content between sites -- (as well as dealing with after the fact updates to that content in the language as well) --- OR --- acknowledging that English is the only Lingua franca between sites and that local language versions of particular data don't have to necessarily be direct translations but rather objective local representations of the data.

As it relates to search: A more rigid delineation of data between sites where relevant data on one site does not appear in search results of the other unless requested - So a country site search by default is really just that -- a search of records entered by that country - and you can request regional data be included... While ODM which picks up relevant English language data for all countries and the region in its searches is the only true regional search of the data across all sites in English and by default.

acorbi commented 7 years ago

@jpeizer1 @Huyeng Let me answer the questions that I have an answer for. I would leave the other unanswered which does not mean that they should not be addressed, there are really good questions in this conversation that I believe need a discussion between ODM team and country partners.

Who is responsible for local translation of English language records entered by either a country site or ODM but geotagged to another and what is done with the English language record appearing on that geotagged site in the interim?

Not sure myself about this. @gimmemochi ?

What are the real liabilities / downsides of one country or ODM adding a record geotagged for another we want to address?

As I understand, the main downside is that that record will appear on the corresponding country site the record has been tagged for. If that record is not translated then there is no translation available if the user is viewing the site on the local language.

-- Allowing a geo tagged country the ability to review the record entered by another entity before publication? (is that feasible with records enetered by ODM? By other countries?)

Possible approaches for this:

  1. Both involved teams work together on preparing the translations and letting the final owner of the record ( organization on CKAN ) to publish. Quoting @Huyeng :

Regarding to the local translation, on CKAN, each team of the organization can't edit any datasets of other organization because privilege. It means that ODC team can't translate any records published on ODM organization or other organizations unless their access are allowed. However, i don't think it is good idea to allow everyone can access every organization to edit something for translation.

This is correct. And I also agree that we should keep each user's permission's limited to the records of its organization.

Can the problems be partially resolved by allowing filtering out of records geotagged by one country for another in the geotagged country's search functions.

I created https://github.com/OpenDevelopmentMekong/wp-odm_solr/issues/132 to adress this question.

I also concern about inconsistency between English and localization of the content that not published by ODC, but translated. For example, ODC team helps to translate an record that ODM published, later, the record has been updated without informing for translation, then the inconsistent information between English and Khmer will be happened.

The process of updating a record on CKAN should be, in my opinion, approached as a holistic update, meaning that all its metadata should be updated, and not only one language

jpeizer1 commented 7 years ago

"The process of updating a record on CKAN should be, in my opinion, approached as a holistic update, meaning that all its metadata should be updated, and not only one language"

But is this practical? Will we really be updating every record entered in every language for the metadata and in fact hold up publishing until its done? I think we have to separate what is practical from what is ideal. I'd also propose looking at English as the lingua franca of the system. Not better or worse but the only language that is used in every country and the regional site. If you look at it that way you open yourself to new possibilities because the rules of that language can function a bit differently than the other languages specific to a country site.

I think there is also an underlying assumption that the languages are used equally in the system while google analytics indicates 85%+ of users are using it in English language even in Cambodia. It appears Khmer use is about 2%. Now I appreciate how important local language is for the country sites but the use of English on that level in all sites should have implications to the way the rule base is ultimately decided.

acorbi commented 7 years ago

But is this practical? Will we really be updating every record entered in every language for the metadata and in fact hold up publishing until its done?

I believe it is practical, if the right process is there and communication works. In case of the update of a dataset, it involves that the record is already published. The team undertaking the update would coordinate with the local teams which would the provide the translation of the updated chunk of information. Once everything is available, the change is done without needing to un/re-publish the record.

And in case of an update, it is a situation that you cannot avoid, since you do not want to update the English content but not the content on other languages, leaving the record in an inconsistent state. do you?

I'd also propose looking at English as the lingua franca of the system. Not better or worse but the only language that is used in every country and the regional site. If you look at it that way you open yourself to new possibilities because the rules of that language can function a bit differently than the other languages specific to a country site.

This would be the alternative to that process I am mentioning. I think the guidelines could be changed in order to require English version as a minimum for publication. But in case of the update of a record which has been already translated (see above) still applied.

gimmemochi commented 7 years ago

Who is responsible for local translation of English language records entered by either a country site or ODM but geotagged to another and what is done with the English language record appearing on that geotagged site in the interim?

@acorbi , my understanding is if a record is geo-tagged "Cambodia," ODC should be responsible for translating the metadata into Khmer, review, and then publish the record. For consistency (in search an presentation on country site, the record should only be published once the metadata has been translated. @prustar will have to tell you the current protocol for such task.

prustar commented 7 years ago

@acorbi @Huyeng @gimmemochi @jpeizer1

Interesting discussions here I just wanted to reiterate the purpose of having a platform in dual languages. The whole objective of the platform is to reach the maximum amount of citizens in the country of interest and as English language speaking skills are generally isolated to a more privileged urban demographic we are missing the majority of our citizenry as a target. The google analytics statistics proves this.

Of course we are limited to the availability of connectivity amongst the rural communities however smart phones are more and more common so it is not impossible. Therefore if we remove the option to make resources, data and content available in the national language then we will likely never progress our objective to truly increase the reach and scope of our user base beyond the English speaking urban population. It's counter intuitive for this type of initiative.

The second issue raised by @Huyeng regarding data integrity is valid and as a network we have yet to determine the workflows and resources needed to ensure that it is maintained to standards and best practices.

Currently the datasets produced that have a broad geographic scope (i.e. more than one LMC) are uploaded onto ODM, we do not yet have the resources to translate all of these resources into the appropriate languages. ODC have expressed that translation of external resources will be a drain upon their resources and cannot undertake this currently for external partners.

Moving towards isolating content on national sites to an organisation level and removing the option to include search results with geographic relevance for that country from other national or regional sites to me would be a disservice to our users and decrease the user ability of our platform. Not to mention being contrary to our mission of allowing a regional and trans-boundary perspective on relevant development issues, such as trans-boundary protected areas, trade, river issues and landscape ecosystems.

To sum up - yes I think it looks strange to have inconsistencies in the presentation of language but the removal of access to these records and limiting the views because of this is not appropriate in my view. Perhaps we need to simply use a tag that identifies that these resource is currently only available in ENGLISH language when they appear on the national site.

jpeizer1 commented 7 years ago

@acorbi @Huyeng @gimmemochi @jpeizer1

There is the ideology of the dual language site and then there is the practical reality of usage that must also be dealt with. Here is an alternative perspective:

You can design the site with the thought that it will increasingly directly serve less educated less literate folks with less infrastructure available to use it (electricity, connectivity, smart phones) in the rural areas or you can design it for the civil society audience you know actually use it in support of those communities today (and a projected 2-3 years out). You can always make changes later if the user base changes but designing for a user base of the future, waiting for them to come, and hoping you don't alienate the current user base by limiting options they prefer is a risky proposition.

I supported/coordinated about 250 connectivity projects at Soros in 45 countries. There was a basic reality we dealt with particularly outside of urban areas. What we saw on the ground was not that most of the citizenry would use this or that site, but that the necessary civil society players, many of them more educated folks from either the urban or rural areas would use the sites to serve those communities... If the demand base changed the sites would be modified accordingly to satisfy them.

Below is the current reality of the ODC user base. Perhaps the question should be what can best be done to serve the current user population over the next 2-3 years -- What the potential future holds and how that might be addressed if and when it comes to fruition.

The following analytic shows 192,989 users of the site in Cambodia coming from Phnom Penh versus 19,454 "not set" meaning its not clear where they are coming from -- from the start of this year. That's a 90% to 10% ratio

https://analytics.google.com/analytics/web/?authuser=0#report/visitors-geo/a64869623w100960531p104869840/%3F_u.date00%3D20170101%26_u.date01%3D20170717%26tabControl.tabId%3Dgeo%26geo-segmentExplorer.segmentId%3Danalytics.city%26geo-table.secSegmentId%3Danalytics.country%26geo-table.plotKeys%3D%5B%5D%26geo-table.rowCount%3D500%26geo-table-dataTable.sortColumnName%3Danalytics.country%26geo-table-dataTable.sortDescending%3Dfalse/

This next statistic shows that of the 361,000 odd sessions during that same period about 3,300 of them (1%) were in Khmer. There is another .27 (879 sessions showing up as km-kh) https://analytics.google.com/analytics/web/?authuser=0#report/visitors-language/a64869623w100960531p104869840/%3F_u.date00%3D20170101%26_u.date01%3D20170717/

Finally there is mobile usage which counts for just under 30% of site users for the same period (noting that 39% of Cambodian users currently have smart phones, less in rural areas): https://analytics.google.com/analytics/web/?authuser=0#report/visitors-mobile-overview/a64869623w100960531p104869840/%3F_u.date00%3D20170101%26_u.date01%3D20170717/

That's up a little less than 5% from the 2016 mobile stats of the Old and New Cambodian Sites: https://analytics.google.com/analytics/web/?authuser=0#report/visitors-mobile-overview/a25100215w48848496p49304667/%3F_u.date00%3D20160101%26_u.date01%3D20161231/ https://analytics.google.com/analytics/web/?authuser=0#report/visitors-mobile-overview/a64869623w100960531p104869840/%3F_u.date00%3D20160101%26_u.date01%3D20161231/

And up 7% from the 2015 mobile stats of the old ODC site: https://analytics.google.com/analytics/web/?authuser=0#report/visitors-mobile-overview/a25100215w48848496p49304667/%3F_u.date00%3D20150101%26_u.date01%3D20151231/

Assuming that cell phone saturation and the slowing rate of increases noted in reports and articles below were irrelevant and that more people actually did access the site 1) from rural areas 2) by smart cell phone and 3) in Khmer -- the most optimistic projection would be that it would take about 4-5 years for just the level of cell phone access to be 50%. But that would be highly optimistic if Khmer was necessary because you'd need to be able to pay for that smartphone with the Khmer capabilities in these rural areas, and of course that the infrastructure existed. Things will eventually come - the question is when and how you configure the site now to increase demand from your current user base.

Suggest reading the following reports on Phone usage -- If your going to access the site via phone and in Khmer, it will need to be a smartphone with Cambodian script capability. That's a limited percentage of Cambodians and a higher percentage of Urban Cambodians. According to the second article related to costs and usage there will be a five year window for the move to smartphones “I believe that in the next five years, 80 or 90 per cent of feature phone users will become smartphone users.” Liu said.

https://asiafoundation.org/resources/pdfs/MobilePhonesinCB2015.pdf http://www.phnompenhpost.com/business/smartphone-sales-slowing

Just a few things to keep in mind if making decisions to satisfy the users of the site over the next 2-3 years. ODC is not a general interest site as much as it is a site for people looking for targeted data with the critical thinking skills and education to use that data. It's probably more practical and realistic to make decisions based on where that user population exists now and over the next 2-3 years, and how it uses the platform. Obviously thinking long term is important, but a high priority short term / mid-term is to influence donors and partners by having a higher user population using the system now versus 5 years from now.