ViennaRSS / vienna-rss

Vienna is a free and open-source RSS/Atom newsreader for macOS.
https://www.vienna-rss.com
Apache License 2.0
1.84k stars 227 forks source link

GitLab Activity Feed: Double space line break in issue comment breaks sync #1545

Open sbondorf opened 2 years ago

sbondorf commented 2 years ago

Describe the bug GitLab has an activity overview (Project Information > Activity) RSS feed that includes, among other news items, comments to issues. The comments are written in markdown, like comments here on GitHub. To add a line break
(like this one),
double space is used. Yet, this breaks Vienna synchronization. Other RSS readers (tested with NetNewsWire) keep sync'ing, though.

To Reproduce

Screenshots I've set up this public repository to demonstrate the issue: https://gitlab.com/sbondorf/rsstestsvienna2/-/issues/1 Screen Shot 2022-02-14 at 15 56 40

Screen Shot 2022-02-14 at 15 57 00

Please complete the following information:

Additional information: Was present in previous versions, too, sorry for not reporting earlier

sbondorf commented 2 years ago

Same additional information as in #1544: Attached are my two validations, working == single spaces only and broken == double space in comment:

barijaona commented 2 years ago

Gitlab newbie question : what is the address of the RSS feed ?

sbondorf commented 2 years ago

https://gitlab.com/sbondorf/rsstestsvienna2/-/issues.atom

Screen Shot 2022-05-02 at 07 44 15

barijaona commented 2 years ago

I think you are referring to https://gitlab.com/sbondorf/rsstestsvienna2.atom ?

sbondorf commented 2 years ago

You are right!

I posted the Atom feed for issues only only. I took it "by accident", didn't check my issue description and I didn't know it would work around the issue.

The feed you posted matches the Issue description as it is the feed of the entire activity view containing all project changes (Wiki, members, etc.). That does not work with Vienna.

The sample project only has a single issue, it breaks Vienna's ability to subscribe to the Activity Feed but not the Issues Feed. Interesting find, I hope that helps debugging.

The issues feed does not validate either but with different problems (a subset, I think).

barijaona commented 2 years ago

This issue as well as issue #1544 are caused by a mix of XHTML and HTML tags…

Inside of a <summary type="xhtml">...</summary>construct, GitLab inserts some <img… > or <br> tags, but the correct syntax should be respectively <img... /> and <br /> (XHTML versions).

If I reverse commit 97480f3, I am able to parse the above mentioned GitLab feeds, but alas, this would reintroduce the #1073 crash…

I will try playing with other options of NSXMLDocument to work around this.

sbondorf commented 2 years ago

Thank you for getting to the bottom of this!

Maybe it is worthwhile to upstream the issue by filing a bug with GitLab, providing them with all the details?

I checked the open issues here, filtered by combinations of "feed", "atom", "html", "xhtml" but that didn't turn up a known (and open) issue.

barijaona commented 2 years ago

Yes, you can report the issue to GitLab and suggest inserting the HTML into CDATA instead of claiming it's XHTML.

However, I will try to make Vienna able to handle malformed feeds like this.

barijaona commented 2 years ago

I notice that you have spotted https://gitlab.com/gitlab-org/gitlab/-/issues/345797 in the meantime

sbondorf commented 2 years ago

Yes, you can report the issue to GitLab and suggest inserting the HTML into CDATA instead of claiming it's XHTML

Is that sufficient as bug information. Will do then.

I notice that you have spotted https://gitlab.com/gitlab-org/gitlab/-/issues/345797 in the meantime

Exactly, that would be nice, too, but only if it doesn't make more problems of the current kind ...

barijaona commented 2 years ago

You can comment on the current GitLab issue, providing https://gitlab.com/sbondorf/rsstestsvienna.atom and https://gitlab.com/sbondorf/rsstestsvienna2.atom as supplemental examples and copying there my explanations.

My suggestion to use CDATA marking is probably the easiest solution, but I don't know if and how the upcoming GitLab 15 will change the situation.

sbondorf commented 2 years ago

By current GitLab issue, do you refer to https://gitlab.com/gitlab-org/gitlab/-/issues/345797 ?

I fail to see the connection, would have opened a new issue

barijaona commented 2 years ago

I have added a comment to the current GitLab issue : https://gitlab.com/gitlab-org/gitlab/-/issues/345797#note_936145356

barijaona commented 2 years ago

@josh64x2 Do you have any suggestion regarding this issue ?

I think we reach the limits of NSXMLDocument 's "closed box" approach. Catching errors and recovering from them would not be easy and would require parsing NSError's localizedDescription.

I am tempted to replace most of RichXMLParser with RSParser, a NetNewsWire framework. The package claims to require macOS 10.15, but my guess is it can easily be back ported to 10.12.

It would also be a source of synergies between Mac open source projects… 😄