flathub-infra / flatpak-external-data-checker

A tool for checking if the external data used in Flatpak manifests is still up to date
GNU General Public License v2.0
115 stars 34 forks source link

Parse datetimes with region/city format in HTTP headers #371

Closed bbhtt closed 11 months ago

bbhtt commented 1 year ago

Fixes https://github.com/flathub/flatpak-external-data-checker/issues/370 Fixes https://github.com/flathub/flatpak-external-data-checker/issues/205

→ flatpak run --branch=master org.flathub.flatpak-external-data-checker --verbose com.netease.CloudMusic.json
DEBUG   src.manifest: Data is not external: no "url" property
DEBUG   src.manifest: Data is not external: no "url" property
DEBUG   src.manifest: Data is not external: no "url" property
DEBUG   src.manifest: Data is not external: no "url" property
DEBUG   src.manifest: Data is not external: no "url" property
INFO    src.manifest: Checking 2 external data items
INFO    src.manifest: Skipped check [1/2] archive keyutils/keyutils-1.6.3.tar.gz (from com.netease.CloudMusic.json)
INFO    src.manifest: Started check [2/2] extra-data CloudMusic/netease-cloud-music.deb (from com.netease.CloudMusic.json)
DEBUG   src.manifest: Source extra-data CloudMusic/netease-cloud-music.deb: applying URLChecker
DEBUG   src.lib.externaldata: Source netease-cloud-music.deb: no remote data change
DEBUG   src.manifest: Source extra-data CloudMusic/netease-cloud-music.deb: got new 2 from URLChecker, skipping remaining checkers
INFO    src.manifest: Finished check [2/2] extra-data CloudMusic/netease-cloud-music.deb (from com.netease.CloudMusic.json)
INFO    src.main: Check finished with 0 error(s)

→ flatpak run --branch=stable org.flathub.flatpak-external-data-checker --verbose com.netease.CloudMusic.json
DEBUG   src.manifest: Data is not external: no "url" property
DEBUG   src.manifest: Data is not external: no "url" property
DEBUG   src.manifest: Data is not external: no "url" property
DEBUG   src.manifest: Data is not external: no "url" property
DEBUG   src.manifest: Data is not external: no "url" property
INFO    src.manifest: Checking 2 external data items
INFO    src.manifest: Skipped check [1/2] archive keyutils/keyutils-1.6.3.tar.gz (from com.netease.CloudMusic.json)
INFO    src.manifest: Started check [2/2] extra-data CloudMusic/netease-cloud-music.deb (from com.netease.CloudMusic.json)
DEBUG   src.manifest: Source extra-data CloudMusic/netease-cloud-music.deb: applying URLChecker
ERROR   src.manifest: Failed to check extra-data CloudMusic/netease-cloud-music.deb with URLChecker: Cannot parse date/time: Mon, 29 Apr 2019 10:43:55 Asia/Shanghai
WARNING src.main: Check finished with 1 error(s)

I'm not sure if this is an acceptable approach but at least it gets rid of the failure and I think I didn't break anything.

Thanks!

bbhtt commented 1 year ago

I fixed the formatting but I don't think rest of the faliures are related to this PR

ERROR: test_get_text (tests.test_htmlchecker.TestHTMLTools)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/flatpak-external-data-checker/flatpak-external-data-checker/src/checkers/htmlchecker.py", line 121, in _get_text
    async with self.session.get(url) as response:
  File "/usr/lib/python3/dist-packages/aiohttp/client.py", line 1117, in __aenter__
    self._resp = await self._coro
  File "/usr/lib/python3/dist-packages/aiohttp/client.py", line 625, in _request
    resp.raise_for_status()
  File "/usr/lib/python3/dist-packages/aiohttp/client_reqrep.py", line 1000, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url=URL('https://httpbingo.org/base64/decode/8J+Ziywg8J+MjSEK4oCm')
FAIL: test_check (tests.test_jsonchecker.TestJSONChecker)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.9/unittest/async_case.py", line 65, in _callTestMethod
    self._callMaybeAsync(method)
  File "/usr/lib/python3.9/unittest/async_case.py", line 88, in _callMaybeAsync
    return self._asyncioTestLoop.run_until_complete(fut)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/lib/python3.9/unittest/async_case.py", line 102, in _asyncioLoopRunner
    ret = await awaitable
  File "/home/runner/work/flatpak-external-data-checker/flatpak-external-data-checker/tests/test_jsonchecker.py", line 27, in test_check
    self.assertRegex(
AssertionError: Regex didn't match: '^[https://github.com/stedolan/jq/releases/download/jq-[0-9\\.\\w]+/jq-[0-9\\.\\w]+\\.tar.gz$](https://github.com/stedolan/jq/releases/download/jq-[0-9///w]+/jq-[0-9///w]+//.tar.gz$)' not found in 'https://github.com/jqlang/jq/releases/download/jq-1.6/jq-1.6.tar.gz'
bbhtt commented 11 months ago

I should have said this before but some test cases would be great. I can see if I can add some if you don't have the time or inclination.

Adding the test is should be easy, but the cloudmusic site is wildly unreliable for me due to being hosted in China, and I can't find another site that sends invalid http headers.

Creating a dummy test with the header modified seems complicated because fedc has to know a working url.

bbhtt commented 11 months ago
Test without patch ``` INFO src.manifest: Finished check [1/2] extra-data unityhub/UnityHubSetup.AppImage (from fedc.test.urlchecker.yaml) ERROR src.manifest: Failed to check extra-data cloudmusic/netease-cloud-music.deb with URLChecker: Cannot parse date/time: Fri, 17 Nov 2017 13:46:28 Asia/Shanghai DEBUG asyncio: <_SelectorSocketTransport fd=7 read=polling write=> received EOF EDEBUG asyncio: Close <_UnixSelectorEventLoop running=False closed=False debug=True> ====================================================================== ERROR: test_check (tests.test_urlchecker.TestURLChecker.test_check) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.12/unittest/async_case.py", line 90, in _callTestMethod if self._callMaybeAsync(method) is not None: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/unittest/async_case.py", line 112, in _callMaybeAsync return self._asyncioRunner.run( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/base_events.py", line 664, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/wirt/Git/github/flatpak-external-data-checker/tests/test_urlchecker.py", line 22, in test_check self.assertIsInstance(data.new_version.size, int) ^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'size' ---------------------------------------------------------------------- Ran 1 test in 20.071s FAILED (errors=1) ```
Test with patch ``` WARNING src.lib.externaldata: Source netease-cloud-music.deb: remote data changed DEBUG src.manifest: Source extra-data cloudmusic/netease-cloud-music.deb: got new 4 from URLChecker, skipping remaining checkers INFO src.manifest: Finished check [2/2] extra-data cloudmusic/netease-cloud-music.deb (from fedc.test.urlchecker.yaml) DEBUG asyncio: <_SelectorSocketTransport fd=7 read=polling write=> received EOF .DEBUG asyncio: Close <_UnixSelectorEventLoop running=False closed=False debug=True> ---------------------------------------------------------------------- Ran 1 test in 20.482s OK ```
wjt commented 11 months ago

We don't need to make HTTP requests to test this. There's already a test case that just feeds strings in various formats through this function. I'll push a patch in a moment...

bbhtt commented 11 months ago

There's already a test case that just feeds strings in various formats through this function

Ah I did not notice that, thanks!

wjt commented 11 months ago

What do you think of those changes?

The other possible change would be to try parsing the standard formats first, then fall back to this logic, but, this works I think.

bbhtt commented 11 months ago

What do you think of those changes?

The logic seems to be ok. date_str_notz was the date_str without the region/city part and match_tz was only the region/city part in my case. You need to change those a bit.

And date_str_notz is leaving a leading space infront.

wjt commented 11 months ago

I swear I tested my changes before I pushed... Thanks!

bbhtt commented 11 months ago

I swear I tested my changes before I pushed... Thanks!

Seems fine now.