XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
77 stars 134 forks source link

itext identifiers prevent duplicate choices from displaying distinct labels #585

Closed tedrick closed 5 months ago

tedrick commented 2 years ago

Software and hardware versions

pyxform v1.7.0, Python v3.7.4

Problem description

When a form has multiple languages, the labels are stored whithin the <itext> section and referred to by ID. The ID pattern uses the name of the choice as part of the ID string. If choices have duplicate values, only one of the duplicate choice's label is stored as all duplicate choices would have the same ID.

Steps to reproduce the problem

Example form: multilang_duplicatechoice.xlsx

type name label::English (en) label::Korean (ko)
select_one dupelist q1 Duplicate choices koDuplicate Choicesko
list_name name label::English (en) label::Korean (ko)
dupelist 1 Pass 패스
dupelist 0 Fail 실패
dupelist -1 Skipped 건너뛴
dupelist -1 Not Applicable 해당 없음

The resulting XML only has 3 choice entries in the itext section XML:

<?xml version="1.0" encoding="utf-8"?>
<h:html xmlns="http://www.w3.org/2002/xforms" xmlns:esri="https://esri.com/xforms" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:jr="http://openrosa.org/javarosa" xmlns:odk="http://www.opendatakit.org/xforms" xmlns:orx="http://openrosa.org/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <h:head>
        <h:title>Survey title not set</h:title>
        <model odk:xforms-version="1.0.0">
            <itext>
                <translation lang="Korean (ko)">
                    <text id="/data/q1:label">
                        <value>ko__Duplicate choices__ko</value>
                    </text>
                    <text id="/data/q1/1:label">
                        <value>패스</value>
                    </text>
                    <text id="/data/q1/0:label">
                        <value>실패</value>
                    </text>
                    <text id="/data/q1/-1:label">
                        <value>해당 없음</value>
                    </text>
                </translation>
                <translation default="true()" lang="English (en)">
                    <text id="/data/q1:label">
                        <value>Duplicate choices</value>
                    </text>
                    <text id="/data/q1/1:label">
                        <value>Pass</value>
                    </text>
                    <text id="/data/q1/0:label">
                        <value>Fail</value>
                    </text>
                    <text id="/data/q1/-1:label">
                        <value>Not Applicable</value>
                    </text>
                </translation>
            </itext>
            <instance>
                <data id="multilang_duplicatechoice">
                    <q1/>
                    <meta>
                        <instanceID/>
                    </meta>
                </data>
            </instance>
            <bind nodeset="/data/q1" type="string"/>
            <bind jr:preload="uid" nodeset="/data/meta/instanceID" readonly="true()" type="string"/>
        </model>
    </h:head>
    <h:body>
        <select1 ref="/data/q1">
            <label ref="jr:itext('/data/q1:label')"/>
            <item>
                <label ref="jr:itext('/data/q1/1:label')"/>
                <value>1</value>
            </item>
            <item>
                <label ref="jr:itext('/data/q1/0:label')"/>
                <value>0</value>
            </item>
            <item>
                <label ref="jr:itext('/data/q1/-1:label')"/>
                <value>-1</value>
            </item>
            <item>
                <label ref="jr:itext('/data/q1/-1:label')"/>
                <value>-1</value>
            </item>
        </select1>
    </h:body>
</h:html>

Expected behavior

Ideally, each choice will display as written in the XLSX file.

Other information

Given that the existing ID pattern is attempting to create a unique ID per string (as 2 questions with the same string will have different IDs since the question properties are used to generate the ID), perhaps the ID could be switched to a GUID?

lindsay-stevens commented 2 years ago

Thanks @tedrick for this detailed report.

To flesh out the problem further - generating a XForm with the duplicate names as-is does not pass ODK Validate. I created a copy of the provided XForm, and manually inserted the duplicate itext node as shown below.

                                <translation lang="Korean (ko)">
                                        <text id="/data/q1:label">
                                                <value>ko__Duplicate choices__ko</value>
                                        </text>
                                        <text id="/data/q1/1:label">
                                                <value>패스</value>
                                        </text>
                                        <text id="/data/q1/0:label">
                                                <value>실패</value>
                                        </text>
                                        <text id="/data/q1/-1:label">
                                                <value>해당 없음</value>
                                        </text>
                                        <text id="/data/q1/-1:label">
                                                <value>건너뛴</value>
                                        </text>
                                </translation>

Using ODK Validate v1.16.0, this resulted in the following error message. Without reading too far into the underlying jr:itext function implementation, I guess the issue is that this function expects to be able to receive / lookup one text node only when it queries a language for a given text id.

>> XForm is invalid. See above for the errors.
org.javarosa.xform.parse.XFormParseException: duplicate definition for text ID "/data/q1/-1:label" and form "null". Can only have one definition for each text form.
    Problem found at nodeset: /html/head/model[@xforms-version=1.0.0]/itext/translation[@lang=Korean (ko)]/text
    With element <text id="/data/q1/-1:label"><value>

There is a XLSForm setting allow_choice_duplicates (docs), and it is used in the provided XLSForm. The existing pyxform tests around this feature in test_fields.py only shows this working for non-itext choices. In the tests, the duplicate choice values are in the XForm body under select1/item/value.

@lognaturel is duplicate choices with itext a use case you'd want to support? I think it's inadvisable form design, and preferable to instead either use distinct choice values and collapse them in analysis, or use one choice with a label that reflects the duplicates e.g. "Skipped or Not Applicable". I don't fully understand the cascading select use case mentioned in the docs but it seems odd to need this setting for that.

An approach where the choice names are used as-is seems like it'd need a javarosa + validate + pyxform changes to support (and maybe collect / enketo as well). An alternative could be that pyxform appends something to the itext id (e.g. a dash and then a number such that it is predictable) to make sure the ids are unique if they aren't already, and allow_choice_duplicates = yes.

tedrick commented 2 years ago

@lindsay-stevens - yes, this is in the context of allow_choice_duplicates being enabled.

We generally agree on the inadvisability of having duplicate choice values in a choice list, though as has been commented elsewhere, there are a few cases where it makes sense (#373 has a good discussion on this). If it's decided to mark as a known limitation, I believe we're fine with that - this was generated though internal testing around some issues related to choice lists and we don't have a customer use case at this point time.

lognaturel commented 2 years ago

is duplicate choices with itext a use case you'd want to support?

Yes but low priority. As @tedrick notes, this is intended to work for the reasons described in #373.

lindsay-stevens commented 7 months ago

Should be resolved by #603 / #614 because the choice itext identifiers in that case use the choice position (enumerated choice index) for the key instead of the choice value. But a test should be added to verify that the XLSForm in the description works as expected, +/- allow_duplicates.

lognaturel commented 7 months ago

User report and confirmation that it's fixed with the new version: https://forum.getodk.org/t/pyxform-duplicating-choices-labels/44037