geopython / pywps

PyWPS is an implementation of the Web Processing Service standard from the Open Geospatial Consortium. PyWPS is written in Python.
https://pywps.org
MIT License
178 stars 117 forks source link

Fix erronous encode of bytes array #598

Closed gschwind closed 3 years ago

gschwind commented 3 years ago

Overview

The encode function does not exist for bytes type, thus I guessed it was wrong.

Related Issue / Discussion

Additional Information

Contribution Agreement

(as per https://github.com/geopython/pywps/blob/master/CONTRIBUTING.rst#contributions-and-licensing)

coveralls commented 3 years ago

Coverage Status

Coverage remained the same at 0.0% when pulling 3734f3cfc0f67ca90782952d2c1dab413ee4d2f0 on gschwind:pywps-4.4-002 into 559f13b960b2ba17599818e05c047ce3f1b2e54d on geopython:pywps-4.4.

gschwind commented 3 years ago

Hello,

I pushed a new logic here d69f1ad5457bee4b81a191959967fd647ed688f0.

The logic here is do our best to inline data, i.e. do not encode data to base64, but if it's unsafe to do so then encode it in base64.

gschwind commented 3 years ago

How do you want use regexp ?

huard commented 3 years ago

The detection of existing CDATA tags.

gschwind commented 3 years ago

Updated with regexpr

huard commented 3 years ago

See https://github.com/geopython/pywps/pull/555/files

gschwind commented 3 years ago

Hello huart,

Maybe I'm wrong but after digging into the code, I think the code that you are providing is unrelated to the code that we are looking at.

The current code looks miss leading in my opinion, the json code that we are looking here is used for 2 different purposes. It is used to (1) serialize requests into the database and (2) for converting outputs to xml response. In the context of serialization of requests (1), we do not need to store the data of the outputs because they do not exist at all when we serialize those outputs. This is why there is no code to de-serialize the data in the outputs code while it is present in the inputs code. Never the less because of the second purpose of this json code, which is converting the outputs to the xml (2), the code need to serialize actual data of the outputs. This is the code modified in the patch above. This code belong the code used to serialize request (1) to the database but it is not useful for that purpose (1).

I currently thinking of better approach that will make the code more obvious, but I do not have any concrete idea at the moment. If you have any suggestion :)

huard commented 3 years ago

To be clear, my point was to specify the full regexp pattern when looking for an existing CDATA tag. So replace

if not re.search('\\]\\]>', out):

by something like

CDATA_PATTERN = re.compile(r'<!\[CDATA\[(.*?)\]\]>')
if not CDATA_PATTERN.search(out):

But to be fair, it's not clear to me why we need check for an existing CDATA tag in the data in the first place.

Another thing to note is that there is another PR looking at OGC API, where the WPS response is a json file, which would not include a CDATA tag. I'll have some time to look into this next week.

gschwind commented 3 years ago

Hello,

Oh, Ok ! :) To answer why I do not use the full pattern, this is because the aim of the check is to check if it is safe to add the CDATA pattern around the string. This is not safe to to this if the string contain ']]>'. For instance we can't use CDATA pattern for:

"Hello ]]> World"

It is not intend to check if the string already have the CDATA pattern. We may do this check too, in case the user added it already.

Best regards

huard commented 3 years ago

Got it ! Thanks for the clarification, makes sense.

cehbrecht commented 3 years ago

@gschwind @huard good to merge?

When the patches are merged I can make a final 4.4 release ... and move to the new main branch.

huard commented 3 years ago

Let me run this with emu+birdy. Should be able to do that today.

huard commented 3 years ago

@cehbrecht Ok to merge.

cehbrecht commented 3 years ago

@gschwind @huard done. Thanks :)