getnikola / nikola

A static website and blog generator
https://getnikola.com/
MIT License
2.6k stars 444 forks source link

Add a test for gallery RSS content #3676

Closed tartley closed 1 year ago

tartley commented 1 year ago

Pull Request Checklist

Description

I added a new test, which asserts about the content of the samplesite gallery RSS feed.

This PR advances issue #3671, by completing the checklist item "Add a test of gallery RSS content".

I believe this PR is complete and mergeable.

tartley commented 1 year ago

Update on the test's failure to parse the enclosures in the generated RSS: This seems to be caused by a problem in the RSS parser I chose, "rss_parser". I do not understand how its parsing of enclosures can ever work. I filed yet another verbose issue there, but I think perhaps the situation there is nontrivial, so I don't have much hope of seeing that resolved soon. I think maybe I made a mistake by selecting someone's hobby RSS parser, rather than something that's actually currently usable. (I searched for "python rss parser", it was late and I clicked the top result.) Current plan is to switch to another RSS parser, maybe "feedparser".

tartley commented 1 year ago

Update on the CI failure in Python 3.11 on Windows latest. The image timestamp is off by one hour, and GMT clocks sprang forward an hour on Sunday night. Is it a timezone/daylight savings bug?

tartley commented 1 year ago

I fixed the failure of the new test to parse RSS enclosures by switching RSS parsing library, from parse-rss to feedparser. Seems to work now.

The only wrinkle is that it doesn't seem to automatically return the gallery 'description' field (which is empty in our data, but i'd like to assert about it so that maybe later we could fill it with something, like the gallery description text.)

Kwpolska commented 1 year ago

If the issue is with dates and times only, it might be worth it to ignore it somehow (e.g. by removing the date/time field altogether).

tartley commented 1 year ago

I pushed some test changes which give better error messages. Turns out, the test failure due to an off-by-one-hour timestamp isn't a daylight savings problem at all. It is caused by the gallery's exclude.meta file not working properly on Windows. Hence, the excluded image tesla2_lg.jpg is still included in the gallery RSS. (Incidentally, this image has an EXIF 'created' timestamp which is one hour less than the expected first RSS item, which is why we were seeing a one hour time difference exception earlier.) It's late, so I won't make any decisions now, but I think the options are: (a) This PR could include a fix for this, if it's trivial, or (b) This PR could change the test to exercise the current erroneous behavior and pass.

tartley commented 1 year ago

A pure guess: Might it be nikola/plugins/task/galleries.py:547, in get_excluded_images():

excluded_image_list = ["{0}/{1}".format(gallery_path, i) for i in excluded_image_name_list]

The / in the format template perhaps causes problems on Windows when the image removal task runs.

tartley commented 1 year ago

OK, so, this is interesting. The 'tentative fix' commit, above, did fix the previous test failure, ie:

The test failure was that the 'id' entry of each item in the RSS feed was unexpected. On linux/mac, it is:

'galleries/demo/tesla4_lg.jpg'

but on Windows, it was:

'galleries\\demo\\tesla4_lg.jpg'

i.e. with single backslashes (the above line shows them escaped, as the failing test output does).

The 'tentative fix' commit is for galleries.py to generate the same 'id' on Windows as it does elsewhere, i.e. use os.path.join instead of {0}/{1}".format(...), as seems common elsewhere in galleries.py. I'm not at all sure this is the right thing to do. Maybe the 'id' really should be different on Windows, to match the filename. Advice welcome. I guess my current thinking is that this PR should be conservative, and not change behavior unless I'm absolutely sure it is a bug, so I'll probably modify this to restore the original behavior, and make the test demonstrate whatever that is.

Also, now with that fix in place, there is a new, similar failure:

FAILED tests/integration/test_demo_build.py::test_gallery_rss - AssertionError: Item 0 mismatch: link: [https://example.com/galleries\demo\tesla4_lg.jpg](https://example.com/galleries/demo/tesla4_lg.jpg) != https://example.com/galleries/demo/tesla4_lg.jpg
assert 'https://exam...tesla4_lg.jpg/' == 'https://exam...tesla4_lg.jpg/'
  - https://example.com/galleries/demo/tesla4_lg.jpg
  ?                              ^    ^
  + [https://example.com/galleries\demo\tesla4_lg.jpg](https://example.com/galleries/demo/tesla4_lg.jpg)
  ?                              ^    ^

The same chain of reasoning applies: I'm pretty certain the windows version is not a valid URL, but it works because browsers will helpfully massage the backslashes into forward slashes and resolve the resulting URL. So probably the ultimately right thing to do is fix the URL generation on Windows? But in the absence of strong advice from the project owners, this PR should probably be conservative, preserve current behavior, and tweak the test to demonstrate that (with a "TODO" comment in the test that I think this might be a bug.)

Thoughts welcome. More as it happens. Thanks people!

Kwpolska commented 1 year ago

The URL generation on Windows is probably buggy and should be fixed, even if that would mean some minor regressions in places depending on the buggy behavior (e.g. RSS feed GUIDs). We usually don’t pay much attention to pull requests being small or very self-contained.

tartley commented 1 year ago

@Kwpolska: The URL generation on Windows is probably buggy and should be fixed, even if that would mean some minor regressions in places depending on the buggy behavior (e.g. RSS feed GUIDs).

Thank you for the advice! This is now fixed, and I believe this PR is complete and AFAIK can be merged.