OpenClinica / enketo-oc

OpenClinica's fork of the Enketo web forms monorepo
Apache License 2.0
0 stars 1 forks source link

Check performance of form with repeat calculations referring to first repeat #169

Open MartijnR opened 5 years ago

MartijnR commented 5 years ago

See email with test form on 2/1/2019.

Figure out how to run in Enketo Core with external data file.

MartijnR commented 5 years ago
MartijnR commented 5 years ago

Some of these can be ignored as they likely refer to using comment-status(). Oddly, validating this form with the oc flag does not show 'refers to itself' errors.

screen shot 2019-02-01 at 4 45 32 pm
MartijnR commented 5 years ago
enketo-validate ~/Enketo/Code/enketo-core/test/temp/perf.xml
enketo-validate ~/Enketo/Code/enketo-core/test/temp/perf.xml --oc
MartijnR commented 5 years ago

The form is definitely invalid. All those calculation refer to themselves, creating infinite loops, that Chrome chooses to terminate eventually. That would explain the performance (without Chrome's termination, the page would be unresponsive)/ Should be errors in console as well, I think.

Still to be figured out, why Enketo Validate swallows these errors in OC mode.. There is something odd about the OC mode of Enketo Validate.

pbowen-oc commented 5 years ago

@MartijnR - As defined in the XLS, the calculation logic for one of these (FORMAT_PREV) is: if(indexed-repeat(${FORMAT_EDIT}, ${MANIFEST}, ${POS}-1)=1 or ${POS}=2, indexed-repeat(${FORMAT_MAN}, ${MANIFEST}, ${POS}-1), indexed-repeat(${FORMAT_PREV}, ${MANIFEST}, ${POS}-1))

So from what I can see, it is setting the value of this item to the a value from the previous row (possibly the same item in the previous row). That doesn't seem self-referential to me since it's pulling from different rows. In any case, is there a different way to write this that would be faster?

MartijnR commented 5 years ago

I think any calculation for a node (say FORMAT_PREV) that includes itself in the calculation is self-referential.

MartijnR commented 5 years ago

Ah sorry, you're right. Hmm.

MartijnR commented 5 years ago

Findings:

MartijnR commented 5 years ago

Opportunities to speed up:

  1. Do not use repeats. Let repeat be entire form. Serve previous record as external instance. Implement setvalue feature in Enketo for proper defaults (or use approach you are using in demo form), or once(). Potential issue: To get the proper syntax, this will either require a setvalue feature to be added to pyxform, or to be done dynamically by OC in the XForm.
  2. Do not use relevant for 'editing', but use dynamic readonly (still to be implemented in Enketo). It will reduce the size of the form to a third (MAN, PREV, EDIT -> 1).
  3. If using repeats, do not use indexed-repeat but use the preceding-sibling axis instead. Maybe something like ../preceding-sibling::MANIFEST/FORMAT_PREV. We can also eliminate the if statement by adding an XPath predicate. Keeping this calculation to a simple path, will definitely help performance.
pbowen-oc commented 5 years ago

@MartijnR - Is OpenClinica/enketo-express-oc#3 something we can do using the XLS form definition.

MartijnR commented 5 years ago

Yes, I think so (since you're not using ODK Validate).

pbowen-oc commented 5 years ago

We were able to upload it. However, we found that it always seemed to be pulling in the value of FORMAT_MAN from row 1 (not the prior row as expected).

Here is the calculation we used. We tried the if conditional logic a couple different way but got the same results.

if(../preceding-sibling::MANIFEST/FORMAT_EDIT = ‘1’ or ${POS}=2, ../preceding-sibling::MANIFEST/FORMAT_MAN, ../preceding-sibling::MANIFEST/FORMAT_PREV)

Do you see anything wrong with this? I can't figure out how it can actually be failing and coming up with the results we're seeing.

MartijnR commented 5 years ago

First of all, start much simpler. Initially don't use the if() statement and don't use the edit button.

Simply populate from previous repeat with ../preceding-sibling::MANIFEST/FORMAT_PREV. If it doesn't exist the value will be set to '' automatically.

Let me know if you want me to create a one-question test form.

MartijnR commented 5 years ago

I believe the XPath would be ../preceding-sibling::MANIFEST[1]/FORMAT_PREV

The reason you're alway getting the value of the first row, is that the expression actually returns all preceding-siblings (of MANIFEST). The string value is the value of the first one using XPath rules.

In the corrected path:

MartijnR commented 5 years ago

profile by creating 20 repeats.

http://localhost:8005/?xform=perf.xml

screen shot 2019-02-27 at 2 49 03 pm

Around 99% of the time is taken up by the evaluations triggered by the 'addrepeat' event.

Of that evaluation time about 99% is taken up by calc.update (75%) and relevant.update (25%)

Within this calc update a dataupdate is created with a relevantPath for each question in the repeat.

MartijnR commented 5 years ago

about 50% of the degradation is because of the direct update function calls in relevant.enable

When removing those update calls in relevant.enable, the remaining degradation is caused by calc.update and relevant.update. However, the number of nodes found in those update functions is staying static. So somewhere inside those functions must be another update call that does increase the number of evaluations linearly.

MartijnR commented 5 years ago

most of the calc.update degradation is caused by this updated object:

{repeatPath: "/Manifest_1.31.19_TEST/MANIFEST", repeatIndex: 2, cloned: true}

This is added by the form.js event handler for the addrepeat event

Eliminated as culprits:

I think it may just be a natural degradation of performance due to DOM increase, but need to check if selectors can be optimized, or if results can be cached.