Closed TheEaterr closed 2 months ago
I made a new commit that uses head requests instead. I had to tap into the specific semantics of the WebRessourceFetcherImpl to change the type of the request, maybe a more broad set method should be added to the WebRessourceFetcher interface ?
Also I updated the test that failed by instead using the real - and not the mock - WebRessourceFetcher and changing around what is expected. I imagine using MockWebRessourceFetcher was intentional but for this specific use case it feels weird, as the part that broke is the online part and I feel like the test should catch that. Else it is probably possible to change the code around a bit to make it work with mocks. If checking online is desirable, it would also be logical to update the other tests.
Thanks a lot!
It looks like the new version of the test does not pass. In general, I would say it would be better to avoid making any requests to online services in the test suite, to keep it reliable. It is true that this prevents us from checking compatibility with the online service as it evolves, but in my opinion that's not the role of such a Java unit test suite. I don't know if this would have helped in this case, but it is also possible to change the mocking strategy and use a MockWebServer
to test against a real but local web server serving pre-determined responses.
Hey so actually the test that failed wasn't the test I fixed that uses online, but another one that I missed when developing locally. The online test does work (as far as I understood the CI output), and I feel makes sense as the getAllJsonDumps online functionality is a core part of it, and the test should be expected to fail if upstream changes. So if all test pass (hopefully) I think it's better this way but if you disagree I'll switch it back to an offline test.
Attention: Patch coverage is 87.50000%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 83.11%. Comparing base (
637b457
) to head (b660991
). Report is 233 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
...ikidata/wdtk/dumpfiles/wmf/JsonOnlineDumpFile.java | 83.33% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@TheEaterr thanks a lot! I see codecov is complaining about missing diff coverage. If you think this can be addressed, we can do that, otherwise happy to merge it as such, I don't want to stand in the way.
Code coverage was right, the function should've been tested. I added a test it should be good now!
It seems that there were changes in the way dumps are structured for wikidata making the way dump files were automatically downloaded not functional anymore.
Here are the fixes implemented, and the reasoning behind it : DAILY : update the check for an available dump as status.txt now indicated "done:all". See : https://dumps.wikimedia.org/other/incr/wikidatawiki/20240419/status.txt I put startsWith but it might be preferable to check for done:all equality ? I don't know the various possible values but this version should work at least most of the time (compared to none right now). JSON : The JSON dumps are currently downloaded from this folder : https://dumps.wikimedia.org/other/wikidata/ It is a bit of a mess including both full and incremental (?) dumps. I switch it to use https://dumps.wikimedia.org/wikidatawiki/entities/ (which seems to be equivalent to https://dumps.wikimedia.org/other/wikibase/wikidatawiki/, I don't know if anyone has any insight into which link should be preferred). Moreover, not all shown dates feature the full json dump so I added a check that the file is actually there (I have not found a status.txt or equivalent for the entitity dump) SITES : Nothing done, haven't tested it but the file it is looking for is still there so it probably still works. CURRENT / FULL : Both of these look for a -pages-meta-current.xml.bz2 / -pages-meta-history.xml.bz2 inside a dump (like https://dumps.wikimedia.org/wikidatawiki/20240401/). However these files do not seem to be there. Nevertheless, parts of these file ARE present (of form https://dumps.wikimedia.org/wikidatawiki/20240401/wikidatawiki-20240401-pages-meta-current1.xml-p1p441397.bz2 for example) but not the united file. If anyone knows why the full file isn't there I could try to fix the issue