GothenburgBitFactory / bugwarrior

Pull github, bitbucket, and trac issues into taskwarrior
http://pypi.python.org/pypi/bugwarrior
GNU General Public License v3.0
736 stars 209 forks source link

jira: certain characters seem to be causing an endless stream of annotations #798

Open srl295 opened 3 years ago

srl295 commented 3 years ago

Every time I sync my tasks, this comment: https://unicode-org.atlassian.net/browse/CLDR-14306?focusedCommentId=159283 gets re-added as an annotation.

The comment is as follows:

These may be variants of: 
U+26911 𦤑
U+2690E 𦤎
U+76A1 皡
U+76A5 皥
U+76B7 皷

so my annotation log looks like this:

                   2020-11-25 15:48:43 @Steven R. Loomis - These may be variants of: ⚑1 ������⚐
                   2020-11-27 13:31:23 @Steven R. Loomis - These may be variants of: ⚑1 ������⚐
                   2020-12-01 09:31:06 @Steven R. Loomis - These may be variants of: ⚑1 ������⚐
                   2020-12-02 14:28:24 @Steven R. Loomis - These may be variants of: ⚑1 ������⚐
                   2020-12-04 10:00:01 @Steven R. Loomis - These may be variants of: ⚑1 ������⚐

Clearly scrambled. Worse, this task gets 'updated' every time bugwarrior-pull runs.

srl295 commented 3 years ago

Additionally, the descriptions for these tasks update every time also:

srl295 commented 3 years ago

yes, unicode on unicode on Unicode!

djmitche commented 3 years ago

There was an encoding bug in Taskwarrior (it implements its own parsing.. yay!) that caused issues like this. Have you tried with the most recent version of TW (2.5.3)?

srl295 commented 3 years ago

There was an encoding bug in Taskwarrior (it implements its own parsing.. yay!) that caused issues like this. Have you tried with the most recent version of TW (2.5.3)?

I've been running 2.5.3 for a couple of days… didn't notice whether this still pops up (I've probably stopped noticing!) but will try again

djmitche commented 3 years ago

828 might be useful, too -- it avoids a lot of the places these kinds of issues crop up.

srl295 commented 3 years ago

@djmitche Still seeing it. 2.5.3, 86a0d15da6d1bdbb02d661c7b5bb378ccd46685a (develop) version fo bugwarrior.

INFO:bugwarrior.db:Updating task 2cae7f3a-939f-455c-bec8-f430307baa71, (bw)Is#14306 - PUA characters exist in the collati .. https://unicode-org.atlassian.net/browse/CLDR-14306; annotations: ['@Peter Edberg - There are some index markers per [https://www...', '@Mark Davis - Need to add a ticket that checks the validity...', '@Peter Edberg - OK, I see these in the gb2312han mapping tabl...', '@Steven R. Loomis - Should we should document the usage (non-usag...', '@Steven R. Loomis - per Ken Lunde, these are a part of _old_ vers...', '@Steven R. Loomis - These may be variants of: ⚑1 ������⚐E ������U...', '@Steven R. Loomis - These may be variants of: ⚑1 ������⚐E ������U...', '@Steven R. Loomis - These may be variants of: ⚑1 ������⚐E ������U...', '@Steven R. Loomis - These may be variants of: ⚑1 ������⚐E ������U...', '@Steven R. Loomis - These may be variants of: ⚑1 ������⚐E

Now… would it make any sense to delete these tasks from taskwarrior and see if they are fixed when re-importing? (woudl i just do `task delete' ?

djmitche commented 3 years ago

It's worth a try..

Underneath, Bugwarrior is encoding these values into command-lines, and invoking task with that information, which is then parsing it out of the command line. So tracking down the error requires careful analysis of all of those encoding and decoding steps, with a hex dump. It's not a lot of fun.

srl295 commented 3 years ago

It's worth a try..

Underneath, Bugwarrior is encoding these values into command-lines, and invoking task with that information, which is then parsing it out of the command line. So tracking down the error requires careful analysis of all of those encoding and decoding steps, with a hex dump. It's not a lot of fun.

very familiar with this sort of fun.

srl295 commented 3 years ago

Recurred.

Actually, what we need to figure out is why bugwarrior made the decision to update the task in the first place.

INFO:bugwarrior.db:Updating task bd7838ec-7297-4cf4-9d5b-0fd7ed169b86, (bw)Is#14306 - PUA characters exist in the collati .. https://unicode-org.atlassian.net/browse/CLDR-14306; annotation…be variants of: ⚑1 ������⚐E ������U...',…

srl295 commented 3 years ago

Here's the JSON from task export:

[
  {
    "id": 477,
    "description": "(bw)Is#14306 - PUA characters exist in the collati .. https://unicode-org.atlassian.net/browse/CLDR-14306",
    "entry": "20201116T232135Z",
    "jiradescription": "The PUA characters U+E2D8, U+E2D9, U+E2DA, U+E2DB, and U+E2DC occur in the collation tailoring for each of:\r\nzh.xml, zh_Hant_HK.xml, and zh_Hant_TW.xml. \r\n\r\nIs this intentional? necessary for backward compatibility?",
    "jirafixversion": "40",
    "jiraid": "CLDR-14306",
    "jiraissuetype": "Bug",
    "jirastatus": "Investigate",
    "jirasummary": "PUA characters exist in the collation for zh.xml, zh_Hant_HK.xml, and zh_Hant_TW.xml",
    "jiraurl": "https://unicode-org.atlassian.net/browse/CLDR-14306",
    "modified": "20210803T215802Z",
    "priority": "M",
    "project": "CLDR",
    "status": "pending",
    "tags": [
      "Unicode"
    ],
    "uuid": "bd7838ec-7297-4cf4-9d5b-0fd7ed169b86",
    "annotations": [
      {
        "entry": "20210803T215618Z",
        "description": "@Peter Edberg - There are some index markers per [https://www..."
      },
      {
        "entry": "20210803T215619Z",
        "description": "@Mark Davis - Need to add a ticket that checks the validity..."
      },
      {
        "entry": "20210803T215620Z",
        "description": "@Peter Edberg - OK, I see these in the gb2312han mapping tabl..."
      },
      {
        "entry": "20210803T215621Z",
        "description": "@Steven R. Loomis - Should we should document the usage (non-usag..."
      },
      {
        "entry": "20210803T215622Z",
        "description": "@Steven R. Loomis - These may be variants of: ⚑1 ������⚐E ������U..."
      },
      {
        "entry": "20210803T215623Z",
        "description": "@Steven R. Loomis - per Ken Lunde, these are a part of _old_ vers..."
      },
      {
        "entry": "20210803T215802Z",
        "description": "@Steven R. Loomis - These may be variants of: ⚑1 ��⚐E ��U..."
      }
    ],
    "urgency": 8.11918
  }
]
srl295 commented 3 years ago

So, it's already bad by this point. U+FFFD menas it's corrupt. What doesnt' make sense is where the letter U went, U+…

This is the Jira comment:

https://unicode-org.atlassian.net/browse/CLDR-14306?focusedCommentId=159283

srl295 commented 3 years ago

here is the raw comment content per jira

{"body":{"version":1,"type":"doc","content":[{"type":"paragraph","content":[
 {"type":"text","text":"These may be variants of: "},
 {"type":"hardBreak"},
 {"type":"text","text":"U+26911 \uD85A\uDD11"},
 {"type":"hardBreak"},
 {"type":"text","text":"U+2690E \uD85A\uDD0E"},
 {"type":"hardBreak"},{"type":"text","text":"U+76A1 皡"},
 {"type":"hardBreak"},{"type":"text","text":"U+76A5 皥"},
 {"type":"hardBreak"},{"type":"text","text":"U+76B7 皷"}]}
djmitche commented 3 years ago

I don't have a chance to look at the pastes, but -- updating on every run indicates that that the data is not round-tripping correctly. So either the data is getting corrupted on the way from Jira -> Taskwarrior, or on the way back.

srl295 commented 3 years ago

It is definitely corrupt in taskwarrior

srl295 commented 2 years ago
srl295 commented 2 years ago
srl295 commented 2 years ago

Still present with taskwarrior v2.6.2

srl295 commented 2 years ago

I'm now seeing the same for Github as well

INFO:bugwarrior.db:Updating task bc317b67-5a4d-432d-adc2-0690a3ddfbb6, (bw)Is#7042 - feat: LDML Keyboard Support 🙀 .. https://github.com/keymanapp/keyman/issues/7042; githubtitle: 'feat: LDML Keyboard Support ������' -> 'feat: LDML Keyboard Support 🙀'; description: '(bw)Is#7042 - feat: LDML Keyboard Support ������ .. https://github.com/keymanapp/keyman/issues/7042' -> '(bw)Is#7042 - feat: LDML Keyboard Support 🙀 .. https://github.com/keymanapp/keyman/issues/7042'