Closed lognaturel closed 5 years ago
Thank you!
Perhaps I should consolidate the templates for sub specs now that we have two. Or maybe we wait for one more? 😬
Unless, you are really eager, I the 3rd sub-spec would be a good time! :)
I came to an unfortunate realization at https://github.com/opendatakit/collect/issues/3126. The way the XPath paths are written out by Collect is idiosyncratic: group names are always followed by a position predicate ([1]
) except in the case of terminal groups with the field-list
appearance. That is, /audit-group-multiplicity/g1[1]/g2[1]/g3[1]/q3
instead of the more expected /audit-group-multiplicity/g1/g2/g3/q3
for a question in three levels of groups.
Those paths are not incorrect so it doesn't matter for analysis tools that actually speak XPath. We do know that folks are writing their own analysis tools or doing quick visual checks, though, and for them it's important that the exact node names be consistent, I think. And ideally, that would include consistency between clients.
I see a couple of options:
group
s and repeat
s) except for terminal groups with the field-list
appearance@MartijnR, do you have a strong preference one way or the other? A different option I'm missing?
except for terminal groups with the field-list appearance
That sounds intriguing and surprising. Maybe I don't understand.
But in any case, my preference would be to not use position predicates except for repeated groups. For repeated groups it might be helpful to always include positions (even if only 1 instance)
Actually, this brings up another point that we had to resolve in OpenClinica's fork (though not related to logs for them). You can remove a repeat instance (e.g. the second, and then later create a new second instance, or the already existing third instance becomes the second). To deal with this (in audit log data) we added an immutable ordinal
attribute to repeats, as well as a last-used-ordinal
(to keep track of this in case the last repeat instance is removed). Both are in http://enketo.org/xforms namespace.
That sounds intriguing and surprising. Maybe I don't understand.
Indeed. Imagine that you have a method that outputs paths with position predicates for every node and then that you have a client that does navigation with page flips. It looks like the position predicates were stripped off nodes that correspond to pages -- questions or field lists. I'm kicking myself for somehow not noticing.
my preference would be to not use position predicates except for repeated groups.
Yes, I agree.
For repeated groups it might be helpful to always include positions.
Agreed.
You can remove a repeat instance (e.g. the second, and then later create a new second instance, or the already existing third instance becomes the second).
I think we had considered the shifting repeat positions not worth a special case but it's true that it's now much more difficult to do things like calculate how much time was spent in a specific repeat instance or track answer changes in a specific repeat instance.
To deal with this (in audit log data) we added an ordinal attribute to repeats.
Can you say a bit more about this? Do you mean Enketo core keeps counters of repeat instances over the lifetime of a record and includes an ID from that count in the log?
It's like this:
<repeat enk:last-used-ordinal="4" enk:ordinal="1">
...
</repeat>
<repeat enk:ordinal="3">
...
</repeat>
The ordinals are 1-based and are stored in the model/submission, so loading a submitted record for editing maintains the state.
In the above example 4 <repeat>
elements were created and the 2nd and 4th were deleted. A new repeat would get ordinal 5.
P.S. OC has a higher-than-normal need for this, because each new value or new repeat is immediately submitted to the server as a fragment (and not the final complete record). I wonder if that is also meant to be trackable in the audit logs (all the intermediate states before submission).
For repeated groups it might be helpful to always include positions (even if only 1 instance)
@MartijnR and I discussed this briefly in Slack and agreed to always include positions for repeats. This avoids the case where the first repeat instance is initially logged as /data/repeat
but then if edits are made to it after more repeat instances are added, it's logged as /data/repeat[1]
.
Filed #248
This spec intends to provide sufficient information for downstream tools to process client audit logs and for additional clients to create compatible logs.
I considered linking to https://tools.ietf.org/html/rfc4180 but it prescribes CRLF linebreaks and is more verbose than I think we need. I also ended up omitting "each line should contain the same number of fields throughout the file" because it seems easy enough for downstream tools to end a row on linebreak. Should it be included?
I verified this by running
bundle exec jekyll serve
and looking at the generated site in Chrome.Perhaps I should consolidate the templates for sub specs now that we have two. Or maybe we wait for one more? 😬
CC @grzesiek2010