Wikidata / Wikidata-Toolkit

Java library to interact with Wikibase
https://www.mediawiki.org/wiki/Wikidata_Toolkit
Apache License 2.0
372 stars 100 forks source link

WmfOnlineDailyDumpFile incorrectly checks for availability #869

Closed TheEaterr closed 2 months ago

TheEaterr commented 4 months ago

When trying to use this toolkit to manage downloading dumps, I encountered an issue on the way they are determined to be available. The code responsible for this is :

protected boolean fetchIsDone() {
        boolean result;
        try (InputStream in = this.webResourceFetcher
                .getInputStreamForUrl(getBaseUrl() + "status.txt")) {
            BufferedReader bufferedReader = new BufferedReader(
                    new InputStreamReader(in, StandardCharsets.UTF_8));
            String inputLine = bufferedReader.readLine();
            bufferedReader.close();
            result = "done".equals(inputLine);
        } catch (IOException e) { // file not found or not readable
            result = false;
        }
        return result;
    }

However, when checking what is provided by the WMF, we see that status.txt doesn't show just done anymore but done:all (and perhaps other, I haven't made an exhaustive check), see: https://dumps.wikimedia.org/other/incr/wikidatawiki/20240414/status.txt

Would it possible to update the "done".equals(inputLine); so it is correct ? (perhaps with startsWith ?)

wetneb commented 4 months ago

@TheEaterr that sounds good - would you like to submit a pull request for this? Using startsWith sounds like a good solution.

TheEaterr commented 4 months ago

Created one with an additional fix for the JSON download, although current / full is still broken. See : https://github.com/Wikidata/Wikidata-Toolkit/pull/872