Open addshore opened 8 years ago
It seems something has changed on the WMF servers without notice, breaking the JSON dump file lookup in WDTK. This must have happened in the past few days. Is your issue related to this? Pull requests are generally welcome. (including @guenthermi who ran into the JSON dump issue yesterday)
This is not related to the issue I was reporting here, however I may have just run into this. Redirects perhaps?
Also regarding the archive.org lookup I have actually implemented this at https://github.com/wikimedia/analytics-wmde-toolkit-analyzer/blob/master/analyzer/src/main/java/org/wikidata/analyzer/Fetcher/ArchiveOrgJsonOnlineDumpFile.java
And it can be seen im my fallback of dump sources at https://github.com/wikimedia/analytics-wmde-toolkit-analyzer/blob/master/analyzer/src/main/java/org/wikidata/analyzer/Fetcher/DumpFetcher.java#L89
In this code I simply do onlineDump.prepareDumpFile to check to see if the dumpFIle is actually there. This of course has the side effect of downloading the whole dump. It may be that my use case doesn't actually want an better implementation of fetchIsDone but instead an exists method!
Merging #231 fixed the critical issue that no dumps could be downloaded. I guess the general aspect discussed here remains valid. Can you use master or do you also need a new release?
Well, I hadn't actually run into the issue we have just fixed when initially filing this ticket (they were totally separate). This ticket is for there to be some way to programmatically check to see if a dump is there without actually having to download it!
The was I would want to use this is:
class DumpFetcher{
public Dump fetchDump( String dateStamp ) {
// Look for dumps stores locally for the given date
// If that fails look on dumps.wm.org for the dump of the given date (but dont download it yet)
// If no dump exists there then look on archive.org (but dont download it yet)
return dump;
}
}
Even though the WMF does not make it easy to check for these dump files it is still possible to check, or at least try..
I my application at https://github.com/wikimedia/analytics-wmde-toolkit-analyzer in https://github.com/wikimedia/analytics-wmde-toolkit-analyzer/blob/master/java/analyzer/src/main/java/org/wikidata/analyzer/Fetcher/DumpFetcher.java I have a fallback through various dump locations. I would like to also add archive.org as a final final fallback here.
This is very hard as the final check on line 69 https://github.com/wikimedia/analytics-wmde-toolkit-analyzer/blob/master/java/analyzer/src/main/java/org/wikidata/analyzer/Fetcher/DumpFetcher.java#L69 always returns true.
A rough draft of my archive.org DumpFile implementation can be seen at https://gerrit.wikimedia.org/r/#/c/282731/