XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
77 stars 134 forks source link

Incorrect parsing of pulldata() with a newline #509

Closed jkpr closed 2 years ago

jkpr commented 3 years ago

Software and hardware versions

pyxform v1.3.3 (and tip of repo as of Dec 29, 2020), Python v3.8

Problem description

We noticed that pulldata was not being parsed correctly. See attached HQ-bfp2-v102.xlsx. The calculate statement is

if(${calc_p1_member_here_yes}, 
pulldata('hq_data', 'num_HH_members', 'metainstanceID', ${p1_hh}),
(count-selected(${p1_fq_id})))

Notice the newline (\n) right before pulldata.

This led to the following in the output XML:

<instance id="if(${calc_p1_member_here_yes}" src="jr://file-csv/if(${calc_p1_member_here_yes}.csv"/>

The bind was correct:

<bind calculate="if( /HHQ/calc_p1_member_here_yes , 
pulldata('hq_data', 'num_HH_members', 'metainstanceID',  /HHQ/p1_hh ),
(count-selected( /HHQ/p1_fq_id )))" nodeset="/HHQ/hq_total_mem" type="string"/>

Steps to reproduce the problem

Convert the attached form HQ-bfp2-v102.xlsx.

Expected behavior

The XML should have

<instance id="hq_data" src="jr://file-csv/hq_data.csv"/>

since the calculate statement is not malformed.

Other information

The problem is in the Regular Expression that parses the calculate statement

https://github.com/XLSForm/pyxform/blob/v1.3.3/pyxform/survey.py#L379

pulldata_arguments = re.sub(".*pulldata\s*\(\s*", "", pulldata_call)

The regex match does not span across newlines, i.e. \n.

If we change our calculate statement to put everything on one line (see HQ-bfp2-v103.xlsx)

if(${calc_p1_member_here_yes}, pulldata('hq_data', 'num_HH_members', 'metainstanceID', ${p1_hh}), (count-selected(${p1_fq_id})))

then we get what we want, i.e. <instance id="hq_data" src="jr://file-csv/hq_data.csv"/>

HQ-bfp2-v102.xlsx HQ-bfp2-v103.xlsx