crs-tools / crs-scripts

CRS Worker scripts
Apache License 2.0
4 stars 5 forks source link

Broken unicode workaround #18

Open lukas2511 opened 1 year ago

lukas2511 commented 1 year ago

At FrOSCon we had some issues with tickets including unicode characters like german quotation and emojis. The postencoding worker simply exited during the XMLin function and affected tickets were stuck in the postencoding state.

Afaik this was the first FrOSCon using a Frab version with real unicode support, so we never had this exact case before. It's not entirely clear if that is the issue or if something else is going on.

While trying to figure out what was happening I wasn't able to reproduce the issue when trying to extract the parsed XML in any way. So I basically knew parsing from written files worked fine... So as a quick and dirty workaround I simply wrote the incoming XML into a file and used it inside of the XMLin function, that worked perfectly. This is not a good solution, but it worked, and I'm posting the patch here in case anybody else runs into the same problem and needs a quick workaround.

diff --git a/lib/CRS/Executor.pm b/lib/CRS/Executor.pm
index 7a699a8..05c9438 100644
--- a/lib/CRS/Executor.pm
+++ b/lib/CRS/Executor.pm
@@ -127,8 +127,16 @@ sub load_job {
     my $jobfile = shift;
     die 'You need to supply a job!' unless $jobfile;

+    my @cset = ('0' ..'9', 'A' .. 'F');
+    my $tstr = join '' => map $cset[rand @cset], 1 .. 8;
+    my $tmpfile = "/tmp/fnord-" . $tstr . ".xml";
+
+    open(my $fh, '>:utf8', $tmpfile);
+    print $fh $jobfile;
+    close $fh;
+
     my $job = XMLin(
-        $jobfile,
+        $tmpfile,
         ForceArray => [
             'option',
             'task',
@@ -137,6 +145,9 @@ sub load_job {
         ],
         KeyAttr => ['id'],
     );
+
+    unlink($tmpfile);
+
     return $job;
 }
a-tze commented 1 year ago

@lukas2511 Do you know the Perl/libs versions used or the linux distro/release? Or do you have a ticket number, the jobfile-XML is retrieved from the tracker as-is and written to a file. this could be some problem in unicode normalization/c14n. german quotes were present before, I think we also tested a pile of poo in a title at some time.

lukas2511 commented 1 year ago

@lukas2511 Do you know the Perl/libs versions used or the linux distro/release? Or do you have a ticket number, the jobfile-XML is retrieved from the tracker as-is and written to a file. this could be some problem in unicode normalization/c14n. german quotes were present before, I think we also tested a pile of poo in a title at some time.

The system was an up-to-date Debian bullseye, Perl v5.32.1. Problem happened with tons of tickets, e.g. 2916.

The same system was used last year without any issues and there also were some talks with german-style quotation in their description... I'm not sure if anything about the tracker or crs scripts changed in the meantime, the only real difference I know of were the encoding changes on the frab database.