Closed gschwind closed 3 years ago
Hello,
I pushed a new logic here d69f1ad5457bee4b81a191959967fd647ed688f0.
The logic here is do our best to inline data, i.e. do not encode data to base64, but if it's unsafe to do so then encode it in base64.
How do you want use regexp ?
The detection of existing CDATA tags.
Updated with regexpr
Hello huart,
Maybe I'm wrong but after digging into the code, I think the code that you are providing is unrelated to the code that we are looking at.
The current code looks miss leading in my opinion, the json code that we are looking here is used for 2 different purposes. It is used to (1) serialize requests into the database and (2) for converting outputs to xml response. In the context of serialization of requests (1), we do not need to store the data of the outputs because they do not exist at all when we serialize those outputs. This is why there is no code to de-serialize the data in the outputs code while it is present in the inputs code. Never the less because of the second purpose of this json code, which is converting the outputs to the xml (2), the code need to serialize actual data of the outputs. This is the code modified in the patch above. This code belong the code used to serialize request (1) to the database but it is not useful for that purpose (1).
I currently thinking of better approach that will make the code more obvious, but I do not have any concrete idea at the moment. If you have any suggestion :)
To be clear, my point was to specify the full regexp pattern when looking for an existing CDATA tag. So replace
if not re.search('\\]\\]>', out):
by something like
CDATA_PATTERN = re.compile(r'<!\[CDATA\[(.*?)\]\]>')
if not CDATA_PATTERN.search(out):
But to be fair, it's not clear to me why we need check for an existing CDATA tag in the data in the first place.
Another thing to note is that there is another PR looking at OGC API, where the WPS response is a json file, which would not include a CDATA tag. I'll have some time to look into this next week.
Hello,
Oh, Ok ! :) To answer why I do not use the full pattern, this is because the aim of the check is to check if it is safe to add the CDATA pattern around the string. This is not safe to to this if the string contain ']]>'. For instance we can't use CDATA pattern for:
"Hello ]]> World"
It is not intend to check if the string already have the CDATA pattern. We may do this check too, in case the user added it already.
Best regards
Got it ! Thanks for the clarification, makes sense.
@gschwind @huard good to merge?
When the patches are merged I can make a final 4.4 release ... and move to the new main
branch.
Let me run this with emu+birdy. Should be able to do that today.
@cehbrecht Ok to merge.
@gschwind @huard done. Thanks :)
Overview
The encode function does not exist for bytes type, thus I guessed it was wrong.
Related Issue / Discussion
Additional Information
Contribution Agreement
(as per https://github.com/geopython/pywps/blob/master/CONTRIBUTING.rst#contributions-and-licensing)