OpenClinica / enketo-oc

OpenClinica's fork of the Enketo web forms monorepo
Apache License 2.0
0 stars 1 forks source link

Fixed upload file/image with character # in filename #205

Open gushil opened 4 weeks ago

gushil commented 4 weeks ago

Closes #204

I have verified this PR works with

What else has been done to verify that this works as intended?

Why is this the best possible solution? Were any other approaches considered?

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Do we need any specific form for testing your changes? If so, please attach one.

gushil commented 3 weeks ago

What is the actual bug (how is the filename with # treated?), and what causes it (in Enketo's code)?

The issue is existed when we are editing the form.

Uploading the file (Submitted successfully) image

Close the form and load the form again image

The problem is because the file with # char is not saved into instanceAttachments correctly.

We can see here the part after # is cutted off image

MartijnR commented 3 weeks ago

We can see here the part after # is cutted off

Where in Enketo is it cut off?

gushil commented 3 weeks ago

We can see here the part after # is cutted off

Where in Enketo is it cut off?

instanceAttachments keys should be full file name with postfix. Because it is cut off, when the form loaded and trying to get the right key, we can't get it and we can't get the right file from instanceAttachments.

Thanks.

MartijnR commented 3 weeks ago

Because it is cut off, when the form loaded

So data-loaded-file-name is set incorrectly somewhere? Where? Is it here or somewhere else? Why is it cut off? Is value not correct? If so, why?

gushil commented 2 weeks ago

Hi @MartijnR

I finally can attach the file attachment through curl and found the problem is caused by enketo-transformer escapeURLPath method that is called by enketo-express media createMediaURL and escapeFileName methods in getMediaMapmethod.

Should we modify escapeURLPath or is it better to modify either createMediaURL and escapeFileName to not use escapeURLPath ?

Thanks.

MartijnR commented 2 weeks ago

Great find! That made me suspect this issue may also be present in enketo/enketo, and I quickly confirmed that using kobotoolbox.org online. I filed the issue here: https://github.com/enketo/enketo/issues/1324

It would be great if you make your PR there and then we merge it here (or if we cannot wait for approval create PRs for both repos).

Should we modify escapeURLPath or is it better to modify either createMediaURL and escapeFileName to not use escapeURLPath

I'm not sure. What is the escapeURLPath method turning the URL with# into?

gushil commented 2 weeks ago

Great find! That made me suspect this issue may also be present in enketo/enketo, and I quickly confirmed that using kobotoolbox.org online. I filed the issue here: enketo#1324

It would be great if you make your PR there and then we merge it here (or if we cannot wait for approval create PRs for both repos).

Should we modify escapeURLPath or is it better to modify either createMediaURL and escapeFileName to not use escapeURLPath

I'm not sure. What is the escapeURLPath method turning the URL with# into?

@MartijnR

With mediaPath = /media/get/1/a/ECG#1.png, transformer.escapeURLPath(mediaPath) turned mediaPath into /media/get/1/a/ECG

Thanks.

MartijnR commented 2 weeks ago

Probably best to figure out what the intention of the escapeURLPath function is (maybe initially intended for form media?), and decide based on that.

MartijnR commented 2 weeks ago

@gushil, are you sure we are passing a correct URL in the API call without a fragment identifier/hash component?

I'm suddenly wondering if the bug is not in Enketo but in OpenClinica and KoBoToolbox due to not encoding special URL characters in the filenames when making the API call to Enketo.....

incorrect:

Screenshot 2024-06-17 at 5 38 20 PM

correct:

Screenshot 2024-06-17 at 5 39 19 PM
gushil commented 2 weeks ago

Hi @MartijnR

I've tested the issue with enketo-oc and centro in my local setup, and the behaviour is similar with the one deployed in the server.

Thanks.

MartijnR commented 2 weeks ago

@gushil, so if I understand you correctly, you are already url-encoding the filename part of the URL you are sending to Enketo in the instance_attachments part of your curl request when reproducing this with Centro?

gushil commented 1 week ago

No. I just send the file name as it is like what happened when it is manually uploaded with the enketo ui because when I debug uploading the file in enketo ui, no url-encoding happened.

Am I wrong doing that?

Really appreciated your suggestion.

Thanks.

On Wed, Jun 19, 2024, 18:19 Martijn van de Rijdt @.***> wrote:

@gushil https://github.com/gushil, so if I understand you correctly, you are already url-encoding the filename part of the URL you are sending to Enketo in the instance_attachments part of your curl request when reproducing this with Centro?

— Reply to this email directly, view it on GitHub https://github.com/OpenClinica/enketo-oc/pull/205#issuecomment-2178427616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAOX627R6BSWNJKGBMB2JTZIFSLPAVCNFSM6AAAAABIWXG7WGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZYGQZDONRRGY . You are receiving this because you were mentioned.Message ID: @.***>

MartijnR commented 1 week ago

The ‘#’ is a reserved character (for a fragment identifier ) in a URI, so it needs to be encoded when it is part of a URL (e.g. in the API request as the value for an instance_attachment item). Enketo doesn’t send it as a URL during submission so doesn’t have to encode it. Did you look closely at the 2 screenshots I posted?

If this is not clear please send the complete curl snippet you are using and I will suggest a change.

gushil commented 1 week ago

Hi @MartijnR

Yes, I saw your screenshots. but I wonder if we upload the file to the form, is it uploading with full path (like https://openclinica.com/media/get/1/a/ECG#1.png) ?

Thanks.

MartijnR commented 1 week ago

but I wonder if we upload the file to the form, is it uploading with full path (like https://openclinica.com/media/get/1/a/ECG#1.png) ?

That URL is not crafted by Enketo. The URL is provided by OpenClinica when it makes the API call (using instance_attachments) Enketo just proxies that URL via the Enketo server (adding the media/get etc).

Enketo only submits the filename. So it's up to OpenClinica to provide a correct URL (which we're now realizing should have a url-encoded filename). OpenClinica decides to make the filename part of the URL. Enketo doesn't require that. It could be any URL (but it makes sense to make the filename part of that URL).

I recommend you first spent 1 minute testing this in your cURL snippet (replacing # with %23 in the instance_attachments URL), before continuing this discussion.... Then at least we know we are on the right track!! (I don't have a properly crafted cURL snippet handy myself that reproduces the issue).

MartijnR commented 1 week ago

Oh @gushil, I think I finally see what you mean! That code has completely changed, and that media/get URL is actually created by Enketo! Is the file stored in the redis database correctly if the URL provided in the API request is wrong (which it still is)?

gushil commented 6 days ago

Hi @MartijnR

With this form (centro/storage/forms/comment_repeat.xml) <?xml version="1.0"?><h:html xmlns="http://www.w3.org/2002/xforms" xmlns:OpenClinica="http://openclinica.com/odm" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:jr="http://openrosa.org/javarosa" xmlns:oc="http://openclinica.org/xforms" xmlns:odk="http://www.opendatakit.org/xforms" xmlns:orx="http://openrosa.org/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><h:head><h:title>Comment repeat</h:title><model odk:xforms-version="1.0.0"><instance><data id="comment_repeat" version="version-1"><group_yt6qb39 jr:template=""><comments/><comments_001/></group_yt6qb39><group_yt6qb39><comments/><comments_001/></group_yt6qb39><upload_image/><meta><instanceID/></meta></data></instance><bind nodeset="/data/group_yt6qb39/comments" oc:itemgroup="RG1" type="string"/><bind nodeset="/data/group_yt6qb39/comments_001" oc:itemgroup="RG1" type="string"/><bind nodeset="/data/upload_image" oc:itemgroup="RG2" type="binary"/><bind jr:preload="uid" nodeset="/data/meta/instanceID" readonly="true()" type="string"/></model></h:head><h:body><group ref="/data/group_yt6qb39"><label></label><repeat nodeset="/data/group_yt6qb39"><input ref="/data/group_yt6qb39/comments"><label>Comments 1:</label></input><input ref="/data/group_yt6qb39/comments_001"><label>Comments 2:</label></input></repeat></group><upload mediatype="image/*" ref="/data/upload_image"><label>Upload image</label></upload></h:body></h:html>

and this image url: image

I'm using this cURL snippet:

curl --user enketorules: -d "server_url=http://localhost:3000&form_id=comment_repeat&ecid=1&instance_id=a&instance=\
    <data xmlns:OpenClinica=\"http://openclinica.com/odm\" xmlns:enk=\"http://enketo.org/xforms\" xmlns:jr=\"http://openrosa.org/javarosa\" xmlns:oc=\"http://openclinica.org/xforms\" xmlns:orx=\"http://openrosa.org/xforms\" id=\"edit_comment_repeat\" version=\"test\">\
    <group_zt8fu31_002>\
        <Patient_Signature_002_001>ECG.png</Patient_Signature_002_001>\
    </group_zt8fu31_002>\
    <meta>\
        <instanceID>uuid:376611b3-1506-4c11-b21a-3a246f6574f2</instanceID>\
    </meta>\
    </data>&instance_attachments[ECG#1.png]=http://localhost:8000/ECG%231.png" http://localhost:8005/oc/api/v1/instance/edit/c

that returns "url": "http://localhost:8005/edit/fs/c/i/05244e567c9854acdd6a22819a3fa573?ecid=1&instance_id=a"

When loading the form edit url http://localhost:8005/edit/fs/c/i/05244e567c9854acdd6a22819a3fa573?ecid=1&instance_id=a

I got this:

image