Open ansell opened 7 years ago
Ive added the 1.9x label to this as the filter file isnt supported currently in 2.x (we arent using the ByteOrderPartitioner so key ranges arent possible).
The filter file is required for when we switch to 2.x to avoid pressure on the downloads service.
@M-Nicholls this feature, generating offline downloads so users can avoid using the live downloads service, is not yet implemented for 2.x. If we switch over before it is implemented, the downloads on downloads.ala.org.au will no longer be updated until it is implemented.
ExportFromIndexStream when run with a filter file does all filtering locally rather than on the server. The key line seems to be:
https://github.com/AtlasOfLivingAustralia/biocache-store/blob/master/src/main/scala/au/org/ala/biocache/export/ExportFromIndexStream.scala#L373
This prevented me from splitting up the single monolithic bulk downloads regeneration job into separate jobs because each of the separate jobs would need to download every record, making them unviable at the current records-per-second performance: https://github.com/AtlasOfLivingAustralia/maintenance/issues/26
Workaround is to keep running the 47 hour bulk downloads regeneration jobs monthly.