Closed gushil closed 2 months ago
What is the actual bug (how is the filename with # treated?), and what causes it (in Enketo's code)?
The issue is existed when we are editing the form.
Uploading the file (Submitted successfully)
Close the form and load the form again
The problem is because the file with # char is not saved into instanceAttachments correctly.
We can see here the part after # is cutted off
We can see here the part after # is cutted off
Where in Enketo is it cut off?
We can see here the part after # is cutted off
Where in Enketo is it cut off?
instanceAttachments keys should be full file name with postfix. Because it is cut off, when the form loaded and trying to get the right key, we can't get it and we can't get the right file from instanceAttachments.
Thanks.
Because it is cut off, when the form loaded
So data-loaded-file-name
is set incorrectly somewhere? Where? Is it here or somewhere else? Why is it cut off? Is value
not correct? If so, why?
Hi @MartijnR
I finally can attach the file attachment through curl and found the problem is caused by enketo-transformer escapeURLPath
method that is called by enketo-express media createMediaURL
and escapeFileName
methods in getMediaMap
method.
Should we modify escapeURLPath
or is it better to modify either createMediaURL
and escapeFileName
to not use escapeURLPath
?
Thanks.
Great find! That made me suspect this issue may also be present in enketo/enketo, and I quickly confirmed that using kobotoolbox.org online. I filed the issue here: https://github.com/enketo/enketo/issues/1324
It would be great if you make your PR there and then we merge it here (or if we cannot wait for approval create PRs for both repos).
Should we modify escapeURLPath or is it better to modify either createMediaURL and escapeFileName to not use escapeURLPath
I'm not sure. What is the escapeURLPath method turning the URL with#
into?
Great find! That made me suspect this issue may also be present in enketo/enketo, and I quickly confirmed that using kobotoolbox.org online. I filed the issue here: enketo#1324
It would be great if you make your PR there and then we merge it here (or if we cannot wait for approval create PRs for both repos).
Should we modify escapeURLPath or is it better to modify either createMediaURL and escapeFileName to not use escapeURLPath
I'm not sure. What is the escapeURLPath method turning the URL with
#
into?
@MartijnR
With mediaPath = /media/get/1/a/ECG#1.png
, transformer.escapeURLPath(mediaPath)
turned mediaPath
into /media/get/1/a/ECG
Thanks.
Probably best to figure out what the intention of the escapeURLPath function is (maybe initially intended for form media?), and decide based on that.
@gushil, are you sure we are passing a correct URL in the API call without a fragment identifier/hash component?
I'm suddenly wondering if the bug is not in Enketo but in OpenClinica and KoBoToolbox due to not encoding special URL characters in the filenames when making the API call to Enketo.....
Hi @MartijnR
I've tested the issue with enketo-oc and centro in my local setup, and the behaviour is similar with the one deployed in the server.
Thanks.
@gushil, so if I understand you correctly, you are already url-encoding the filename part of the URL you are sending to Enketo in the instance_attachments
part of your curl
request when reproducing this with Centro?
No. I just send the file name as it is like what happened when it is manually uploaded with the enketo ui because when I debug uploading the file in enketo ui, no url-encoding happened.
Am I wrong doing that?
Really appreciated your suggestion.
Thanks.
On Wed, Jun 19, 2024, 18:19 Martijn van de Rijdt @.***> wrote:
@gushil https://github.com/gushil, so if I understand you correctly, you are already url-encoding the filename part of the URL you are sending to Enketo in the instance_attachments part of your curl request when reproducing this with Centro?
— Reply to this email directly, view it on GitHub https://github.com/OpenClinica/enketo-oc/pull/205#issuecomment-2178427616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAOX627R6BSWNJKGBMB2JTZIFSLPAVCNFSM6AAAAABIWXG7WGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZYGQZDONRRGY . You are receiving this because you were mentioned.Message ID: @.***>
The ‘#’ is a reserved character (for a fragment identifier ) in a URI, so it needs to be encoded when it is part of a URL (e.g. in the API request as the value for an instance_attachment item). Enketo doesn’t send it as a URL during submission so doesn’t have to encode it. Did you look closely at the 2 screenshots I posted?
If this is not clear please send the complete curl snippet you are using and I will suggest a change.
Hi @MartijnR
Yes, I saw your screenshots. but I wonder if we upload the file to the form, is it uploading with full path (like https://openclinica.com/media/get/1/a/ECG#1.png) ?
Thanks.
but I wonder if we upload the file to the form, is it uploading with full path (like https://openclinica.com/media/get/1/a/ECG#1.png) ?
That URL is not crafted by Enketo. The URL is provided by OpenClinica when it makes the API call (using instance_attachments
) Enketo just proxies that URL via the Enketo server (adding the media/get
etc).
Enketo only submits the filename. So it's up to OpenClinica to provide a correct URL (which we're now realizing should have a url-encoded filename). OpenClinica decides to make the filename part of the URL. Enketo doesn't require that. It could be any URL (but it makes sense to make the filename part of that URL).
I recommend you first spent 1 minute testing this in your cURL snippet (replacing #
with %23
in the instance_attachments
URL), before continuing this discussion.... Then at least we know we are on the right track!! (I don't have a properly crafted cURL snippet handy myself that reproduces the issue).
Oh @gushil, I think I finally see what you mean! That code has completely changed, and that media/get URL is actually created by Enketo! Is the file stored in the redis database correctly if the URL provided in the API request is wrong (which it still is)?
Hi @MartijnR
With this form (centro/storage/forms/comment_repeat.xml)
<?xml version="1.0"?><h:html xmlns="http://www.w3.org/2002/xforms" xmlns:OpenClinica="http://openclinica.com/odm" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:jr="http://openrosa.org/javarosa" xmlns:oc="http://openclinica.org/xforms" xmlns:odk="http://www.opendatakit.org/xforms" xmlns:orx="http://openrosa.org/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><h:head><h:title>Comment repeat</h:title><model odk:xforms-version="1.0.0"><instance><data id="comment_repeat" version="version-1"><group_yt6qb39 jr:template=""><comments/><comments_001/></group_yt6qb39><group_yt6qb39><comments/><comments_001/></group_yt6qb39><upload_image/><meta><instanceID/></meta></data></instance><bind nodeset="/data/group_yt6qb39/comments" oc:itemgroup="RG1" type="string"/><bind nodeset="/data/group_yt6qb39/comments_001" oc:itemgroup="RG1" type="string"/><bind nodeset="/data/upload_image" oc:itemgroup="RG2" type="binary"/><bind jr:preload="uid" nodeset="/data/meta/instanceID" readonly="true()" type="string"/></model></h:head><h:body><group ref="/data/group_yt6qb39"><label></label><repeat nodeset="/data/group_yt6qb39"><input ref="/data/group_yt6qb39/comments"><label>Comments 1:</label></input><input ref="/data/group_yt6qb39/comments_001"><label>Comments 2:</label></input></repeat></group><upload mediatype="image/*" ref="/data/upload_image"><label>Upload image</label></upload></h:body></h:html>
and this image url:
I'm using this cURL snippet:
curl --user enketorules: -d "server_url=http://localhost:3000&form_id=comment_repeat&ecid=1&instance_id=a&instance=\
<data xmlns:OpenClinica=\"http://openclinica.com/odm\" xmlns:enk=\"http://enketo.org/xforms\" xmlns:jr=\"http://openrosa.org/javarosa\" xmlns:oc=\"http://openclinica.org/xforms\" xmlns:orx=\"http://openrosa.org/xforms\" id=\"edit_comment_repeat\" version=\"test\">\
<group_zt8fu31_002>\
<Patient_Signature_002_001>ECG.png</Patient_Signature_002_001>\
</group_zt8fu31_002>\
<meta>\
<instanceID>uuid:376611b3-1506-4c11-b21a-3a246f6574f2</instanceID>\
</meta>\
</data>&instance_attachments[ECG#1.png]=http://localhost:8000/ECG%231.png" http://localhost:8005/oc/api/v1/instance/edit/c
that returns
"url": "http://localhost:8005/edit/fs/c/i/05244e567c9854acdd6a22819a3fa573?ecid=1&instance_id=a"
When loading the form edit url http://localhost:8005/edit/fs/c/i/05244e567c9854acdd6a22819a3fa573?ecid=1&instance_id=a
I got this:
@MartijnR
You are right. Let me look into media lib now.
Thanks for the suggestion.
Hi @MartijnR
After a couple of trial, testing and contemplating. I think this is the simplest solution that works and doesn't need a lot of changes.
These are the changes I made:
enketo-core/utils
getFilename
always returns fileName that are URL-safe.file.name
in filepicker
if getFilename
doesn't return the original file.name
in it (already encoded) to make sure the widget displays the right filename.It doesn't need us to handle encoding of instanceAttachments
because the filename in it is always URL-safe.
Please share your thought about it.
Thanks.
Hi @MartijnR
Thanks for your detailed answer.
Could you please help me what steps should I do to thouroghly make sure that all the cases are resolved? (I'm sorry for assumming that if all unit tests are passed, those mean OK)
Something like:
New records
Old records
Other case 1
Other case 2
Thanks!
I think something like these at least:
/oc/api/v1/survey/collect
API endpoint).#
in the filename into the form. With respect to loading old records that contain images with unsafe characters in their filenames, best to check with your colleagues. I think it would be great if your fix worked for them keeping the original filenames but since this issue has so far only been reported by OpenClinica and is not absolutely critical, it may be acceptable to not cover that case.
The instance and the instance_attachment URLs provided in the /instance API request are stored in the redis cache for just 30 seconds. They have the prefix in:
. What I do is immediately after making the API request, I use keys in:*
to find the key in redis-cli
. There will normally just be one result when testing locally. You can then inspect the entry with e.g. hgetall in:b
(if in:b
was the key you found, i.e. b
was the instanceID)). You could of course also change to a longer expiration temporarily during development (here).
I don't think there are any tests for this behavior unfortunately. If you end up tweaking the media library or the transformer, I am hoping there will be helpful tests that will fail and show you the intended behavior that ends up stripping the hash parts of the keys and URLs. If you end up tweaking the urlencoded app middleware, perhaps some helpful tests may flag issues too, but am not sure.
P.S. 1 If you decide to go back to fixing the various decoding issues in Enketo, I'm wondering if you concluded that passing a custom
qs
(decoded) function to urlencoded is not feasible? It seems like it might be good if any URL passed in the API is not decoded at all (this would not be a solution for media.js/transformer issues, but just for the redis URL storage issue)@MartijnR
I read the documentation and the source code of urlencoded (https://github.com/expressjs/body-parser/blob/master/lib/types/urlencoded.js) and found no way to pass a custom qs function.
Anything I missed?
I think something like these at least:
- open empty form with a regular file upload question and an annotate question (that URL is always the same you can request it for the first time with the
/oc/api/v1/survey/collect
API endpoint).- upload files with a
#
in the filename into the form.- check the submission to see that the filename of the file that is sent is the same as the filename in the XML node.
- load the record that was submitted using a curl API request to generate the temporary edit URL
- Make sure the frontend/page displays the previously uploaded images, and that the filenames are the ones that were submitted in the 3rd step.
With respect to loading old records that contain images with unsafe characters in their filenames, best to check with your colleagues. I think it would be great if your fix worked for them keeping the original filenames but since this issue has so far only been reported by OpenClinica and is not absolutely critical, it may be acceptable to not cover that case.
The instance and the instance_attachment URLs provided in the /instance API request are stored in the redis cache for just 30 seconds. They have the prefix
in:
. What I do is immediately after making the API request, I usekeys in:*
to find the key inredis-cli
. There will normally just be one result when testing locally. You can then inspect the entry with e.g.hgetall in:b
(ifin:b
was the key you found, i.e.b
was the instanceID)). You could of course also change to a longer expiration temporarily during development (here).I don't think there are any tests for this behavior unfortunately. If you end up tweaking the media library or the transformer, I am hoping there will be helpful tests that will fail and show you the intended behavior that ends up stripping the hash parts of the keys and URLs. If you end up tweaking the urlencoded app middleware, perhaps some helpful tests may flag issues too, but am not sure.
I finally can understand and replicate what you mean to test the submission and editing form. I found that my solution still not fixing all the problems. Still working on it.
Thanks.
I read the documentation and the source code of urlencoded (https://github.com/expressjs/body-parser/blob/master/lib/types/urlencoded.js) and found no way to pass a custom qs function.
Anything I missed?
Oh, maybe I misread... I thought passing extended: true
would allow you to do that. Sorry!
Hi @MartijnR
I've tried both ways:
I choose the second way because it involve editing less files and all tests are passed.
Currently testing annotate widget. Meanwhile, could you please review it?
Thanks.
Still got this error on annotate widget, eventhough instanceAttacnhments and instance is correct.
Update. Annotate widget is working.
With this curl command for editing
curl --user enketorules: -d "server_url=http://localhost:3000&form_id=annotation_widget&ecid=11&instance_id=a1&instance=\
<data xmlns:OpenClinica=\"http://openclinica.com/odm\" xmlns:jr=\"http://openrosa.org/javarosa\" xmlns:oc=\"http://openclinica.org/xforms\" xmlns:orx=\"http://openrosa.org/xforms\" id=\"annotation_widget\" version=\"1\" xmlns:enk=\"http://enketo.org/xforms\">\
<image_annotation type=\"file\">annotation-20_43_14.png</image_annotation>\
</data>&instance_attachments[annotation-20_43_14.png]=http://localhost:8000/annotation-20_43_14.png" http://localhost:8005/oc/api/v1/instance/edit/c
I got this
Closed this PR because new work has been created on PR #212
Closes #204
I have verified this PR works with
What else has been done to verify that this works as intended?
Why is this the best possible solution? Were any other approaches considered?
How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Do we need any specific form for testing your changes? If so, please attach one.