AtlasOfLivingAustralia / biocache-service

Occurrence & mapping webservices
https://biocache-ws.ala.org.au/ws/
Other
9 stars 26 forks source link

ShapeFileRecordWriter prematurely interrupted during finalise #335

Closed ansell closed 5 years ago

ansell commented 5 years ago

The following stacktrace appears in the logs on prod-bdown-b5 and seem to indicate that an otherwise successful shapefile download was possibly corrupted by the interrupt mechanism used by biocache offline downloads for resource cleanup. The download was for a relatively small 370,350 records, so the default timeouts should definitely allow for it to complete.

2018-10-04 11:45:36,396 [biocachedownload-pool-500000-RECORDS_INDEX-1] INFO au.org.ala.biocache.service.AuthService  (AuthService.java:199) - authCache requesting: https://auth.ala.org.au/userdetails/userDetails/getUserDetails?userName=REDACTED@REDACTED
2018-10-04 11:45:42,539 [Thread-31133] INFO au.org.ala.biocache.writer.ShapeFileRecordWriter  (ShapeFileRecordWriter.java:278) - Copying Shape zip file to outputstream
2018-10-04 11:45:42,914 [Thread-31133] ERROR au.org.ala.biocache.writer.ShapeFileRecordWriter  (ShapeFileRecordWriter.java:298) - Unable to create ShapeFile
java.nio.channels.ClosedByInterruptException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
    at java.io.InputStream.read(InputStream.java:101)
    at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2146)
    at org.apache.commons.io.IOUtils.copy(IOUtils.java:2102)
    at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2123)
    at org.apache.commons.io.IOUtils.copy(IOUtils.java:2078)
    at au.org.ala.biocache.writer.ShapeFileRecordWriter.finalise(ShapeFileRecordWriter.java:280)
    at au.org.ala.biocache.dao.SearchDAOImpl$3.run(SearchDAOImpl.java:1275)
    at java.lang.Thread.run(Thread.java:748)
2018-10-04 11:45:45,140 [biocachedownload-pool-500000-RECORDS_INDEX-1] DEBUG au.org.ala.biocache.service.DownloadService  (DownloadService.java:650) - DOI minted: 10.26197/5bb570c2d9a27
2018-10-04 11:46:24,366 [AsyncAppender-Dispatcher-Thread-4] WARN org.apache.commons.httpclient.HttpMethodBase  (HttpMethodBase.java:682) - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2018-10-04 11:46:48,896 [biocachedownload-pool-500000-RECORDS_INDEX-1] DEBUG au.org.ala.biocache.service.EmailService  (EmailService.java:64) - Send email to : REDACTED@REDACTED
2018-10-04 11:46:48,957 [biocachedownload-pool-500000-RECORDS_INDEX-1] DEBUG au.org.ala.biocache.dao.JsonPersistentQueueDAOImpl  (JsonPersistentQueueDAOImpl.java:231) - Removing the download from the queue
2018-10-04 11:46:48,957 [biocachedownload-pool-500000-RECORDS_INDEX-1] INFO au.org.ala.biocache.dao.JsonPersistentQueueDAOImpl  (JsonPersistentQueueDAOImpl.java:236) - Deleting /data/cache/downloads/offline1538616792026.json true
ansell commented 5 years ago

The zip file contains an empty file inside named the same as the original filename: Wa_bicolor.zip, indicating that the shapefile was not written at all to the file which was sent to doi.ala.org.au for archival.

$ unzip -l Wa_bicolor.zip 
Archive:  Wa_bicolor.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
  9959155  10-04-2018 11:33   Wa_bicolor.shapefile
        0  10-04-2018 11:44   Wa_bicolor.zip
     2777  10-04-2018 11:45   Shape-README.html
    15655  10-04-2018 11:45   citation.csv
    13890  10-04-2018 11:45   README.html
    15343  10-04-2018 11:45   headings.csv
---------                     -------
 10006820                     6 files

The Wa_bicolor.shapefile has no relationship to the Shapefile specification and is badly misnamed. It contains a CSV file with two columns for coordinates and no other data:

latitude,longitude
-36.556738531,149.395073474
-36.551377292,149.403956269
-36.54902716,149.41321075
-34.97726,148.61836
-37.5067,148.0929
-36.342,145.363
-36.710764776,149.347628877
-31.062310605,149.086605032
-33.65339355,150.253515524
.....
ansell commented 5 years ago

There was another shapefile written out just before this shapefile on the other 500000 record download pool thread, which may indicate a data race or other conflict inside of biocache-service as a possible way of replicating the issue:

2018-10-04 11:42:14,327 [biocachedownload-pool-500000-RECORDS_INDEX-0] INFO au.org.ala.biocache.dao.SearchDAOImpl  (SearchDAOImpl.java:1429) - Download of 18051 records in 6 seconds. Record/sec: 3008
2018-10-04 11:42:23,149 [Thread-31154] INFO au.org.ala.biocache.writer.ShapeFileRecordWriter  (ShapeFileRecordWriter.java:278) - Copying Shape zip file to outputstream
2018-10-04 11:42:25,297 [biocachedownload-pool-500000-RECORDS_INDEX-0] ERROR au.org.ala.biocache.service.DownloadService  (DownloadService.java:882) - A null record was returned from the collectory citation service: [{name=Australia's Virtual Herbarium, citation=Records provided by Australia's Virtual Herbarium, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp36, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp36, DOI=}, {name=Australian Museum Marine Invertebrate Collection, citation=Records provided by Australian Museum Marine Invertebrate Collection, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co113, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co113, DOI=}, {name=Office of Environment and Heritage, Department of Premier and Cabinet representing the State of New South Wales, citation=Records provided by Office of Environment and Heritage, Department of Premier and Cabinet representing the State of New South Wales, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp34, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp34, DOI=}, {name=Australian Platypus Conservancy, citation=Australian Platypus Conservancy, rights=Creative Commons Attribution (International) (CC-BY 4.0 (Int)), link=For more information: https://collections.ala.org.au/public/show/dr8128, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr8128, DOI=}, null, {name=Australian Museum Malacology Collection, citation=Records provided by Australian Museum Malacology Collection, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co114, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co114, DOI=}, null, {name=OEH Atlas of NSW Wildlife, citation=BioNet Species Sightings occurrence data held by the NSW Office of Environment and Heritage (OEH).   The BioNet repository holds data from a number of sources and custodians. (Accessed through ALA Data Portal,<Date of Access>)., rights=Creative Commons Attribution (International) (CC-BY 4.0 (Int)), link=For more information: https://collections.ala.org.au/public/show/dr368, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr368, DOI=doi:10.15468/14jd9g}, {name=iNaturalist, citation=iNaturalist.org: iNaturalist Research-grade Observations. doi:10.15468/ab3s5x
Accessed via http://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7 on 2017-03-16, rights=CC-BY-NC-Int, link=For more information: https://collections.ala.org.au/public/show/dr1411, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr1411, DOI=}, {name=Australian Museum provider for OZCAM, citation=Australian Museum, http://australianmuseum.net.au/, rights=CC-BY, link=For more information: https://collections.ala.org.au/public/show/dr340, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr340, DOI=10.15468/e7susi}, {name=eBird Australia, citation=eBird Basic Dataset. Version: ebd_AU_relNov-2016. Cornell Lab of Ornithology, Ithaca, New York. November 2016.
See http://help.ebird.org/customer/portal/articles/1006835-recommended-citation?t=412380 for other citations, rights=Creative Commons Zero (CC0), link=For more information: https://collections.ala.org.au/public/show/dr2009, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr2009, DOI=}, {name=Australian National Herbarium, citation=Records provided by Australian National Herbarium, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co12, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co12, DOI=}, {name=Australian Museum Herpetology Collection, citation=Records provided by Australian Museum Herpetology Collection, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co10, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co10, DOI=}, {name=OZCAM (Online Zoological Collections of Australian Museums) Provider, citation=Records provided by OZCAM (Online Zoological Collections of Australian Museums) Provider, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp20, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp20, DOI=}, {name=National Herbarium of New South Wales, citation=Records provided by National Herbarium of New South Wales, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co54, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co54, DOI=}, {name=Global Biodiversity Information Facility, citation=Records provided by Global Biodiversity Information Facility, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp42, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp42, DOI=}, {name=Encyclopedia of Life Images - Flickr Group, citation=Citation is at the individual image, rights=other Creative Commons, varies with individual images, link=For more information: https://collections.ala.org.au/public/show/dr360, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr360, DOI=}, {name=New South Wales Bird Atlassers, citation=New South Wales Bird Atlassers Inc. (http://nswbirdatlassers.com/), rights=CC-BY, link=For more information: https://collections.ala.org.au/public/show/dr1089, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr1089, DOI=doi:10.15468/kfkspo}, {name=Murray-Darling Basin waterbird survey, citation=Murray–Darling Basin Authority, rights=CC-BY-Int, link=For more information: https://collections.ala.org.au/public/show/dr4731, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr4731, DOI=doi:10.15468/ke838t}, {name=Australian Museum, citation=Records provided by Australian Museum, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/in4, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=in4, DOI=}, {name=Centre for Australian National Biodiversity Research, citation=Records provided by Centre for Australian National Biodiversity Research, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/in5, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=in5, DOI=}, {name=MEL AVH data, citation=Records provided by MEL AVH data, accessed through ALA website., rights=Creative Commons Attribution (Australia) (CC-BY 3.0 (Au)) You are licensed to use, reproduce, distribute, publish or otherwise make available the Data, or to use or adapt the Data to create derivative works. 
In any reproduction, distribution or publication of the Data, you must attribute the source of the Data as Australia's Virtual Herbarium (AVH). 
In any use or adaptation of the Data to create derivative works, you must acknowledge the use of the Data in the derivative work and attribute the source of the Data as Australia's Virtual Herbarium (AVH). 
You acknowledge that the Data may contain errors and omissions and that you employ the Data at your own risk.
Neither the Council of Heads of Australasian Herbaria (CHAH) nor any other Data Custodian will accept liability for any loss, damage, cost or expenses that you may incur as a result of the use of or reliance upon the Data. 
You should use the current Data from the ALA portal and not rely on material you have previously printed or downloaded., link=For more information: https://collections.ala.org.au/public/show/dr376, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr376, DOI=doi:10.15468/rhzrxw}, {name=The Royal Botanic Gardens & Domain Trust, citation=Records provided by The Royal Botanic Gardens & Domain Trust, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/in50, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=in50, DOI=}, {name=GBIF records, citation=Citation is at record level, rights=other Rights is at record level, link=For more information: https://collections.ala.org.au/public/show/dr695, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr695, DOI=}, {name=First Bird Atlas, citation=Birds Australia - First Bird Atlas (http://www.birdsaustralia.com.au/), rights=CC-BY-NC, link=For more information: https://collections.ala.org.au/public/show/dr571, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr571, DOI=doi:10.15468/f6tipo}, {name=Historical Bird Atlas, citation=Birds Australia - Historical Bird Atlas (http://www.birdsaustralia.com.au/), rights=CC-BY-NC, link=For more information: https://collections.ala.org.au/public/show/dr570, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr570, DOI=doi:10.15468/pljute}, {name=BirdLife Australia, Birdata, citation=BirdLife Australia - Birdata Project (http://www.birdata.com.au), rights=CC-BY-NC, link=For more information: https://collections.ala.org.au/public/show/dr359, dataGeneralizations=, informationWithheld=Please note that details of records of sensitive species have been removed from this data set.  To request full access to these records please contact Birds Australia (atlas@birdlife.org.au)., downloadLimit=, uid=dr359, DOI=doi:10.15468/dchsnk}, {name=Flickr, citation=Records provided by Flickr, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp29, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp29, DOI=}, {name=Fungimap, citation=Citation is at record level: <collector as indicated in the individual records>, <survey identification>, http://fungimap.org.au/, rights=Creative Commons Attribution (International) (CC-BY 4.0 (Int)), link=For more information: https://collections.ala.org.au/public/show/dr711, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr711, DOI=}, {name=BirdLife Australia, citation=Records provided by BirdLife Australia, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp28, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp28, DOI=}], collected stats were: {dp36=5, co113=1, dp34=6850, dr8128=1, infoHeaders,Record ID,Catalogue Number,Taxon Concept GUID,Scientific Name - original,Vernacular name - original,Scientific Name,Taxon Rank,Vernacular name,Kingdom,Phylum,Class,Order,Family,Genus,Species,Subspecies,Data Resource ID,Data Resource Name,Institution ID,Institution,Collection ID,Collection,Licence,Institution Code,Collection Code,Locality,Latitude - original,Longitude - original,Geodetic datum - original,Latitude,Longitude,Coordinate Precision,Coordinate Uncertainty in Metres,Country - parsed,State - parsed,Local Government Areas,IMCRA 4 Regions,IBRA 7 Regions,Maximum elevation in meters,Minimum elevation in meters,Minimum depth in meters,Maximum depth in meters,Individual count,Collector,Year,Month,Day,Event Date - parsed,Verbatim event date,Basis Of Record - original,Basis Of Record,Occurrence status,Sex,Preparations,Outlier for layer,Taxonomic Quality,Location Quality=-2, co114=5, infoFields,id,catalogue_number,taxon_concept_lsid,raw_taxon_name,raw_common_name,taxon_name,rank,common_name,kingdom,phylum,class,order,family,genus,species,subspecies,data_resource_uid,data_resource,institution_uid,institution_name,collection_uid,collection_name,license,institution_code,collection_code,raw_locality,raw_latitude,raw_longitude,raw_datum,latitude,longitude,coordinate_precision,coordinate_uncertainty,country,state,cl959,cl21,cl1048,min_elevation_d,max_elevation_d,min_depth_d,max_depth_d,individual_count,collector,year,month,day,occurrence_date,verbatim_event_date,raw_basis_of_record,basis_of_record,occurrence_status,raw_sex,preparations,outlier_layer,taxonomic_kosher,geospatial_kosher=-1, dr368=6850, dr1411=1, dr340=8, dr2009=7443, co12=1, co10=2, dp20=8, co54=4, dp42=73, dr360=4, dr1089=71, dr4731=10, in4=8, in5=1, dr376=5, in50=4, dr695=73, dr571=427, dr570=182, dr359=2974, dp29=4, dr711=2, dp28=3583}
2018-10-04 11:42:25,297 [biocachedownload-pool-500000-RECORDS_INDEX-0] ERROR au.org.ala.biocache.service.DownloadService  (DownloadService.java:882) - A null record was returned from the collectory citation service: [{name=Australia's Virtual Herbarium, citation=Records provided by Australia's Virtual Herbarium, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp36, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp36, DOI=}, {name=Australian Museum Marine Invertebrate Collection, citation=Records provided by Australian Museum Marine Invertebrate Collection, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co113, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co113, DOI=}, {name=Office of Environment and Heritage, Department of Premier and Cabinet representing the State of New South Wales, citation=Records provided by Office of Environment and Heritage, Department of Premier and Cabinet representing the State of New South Wales, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp34, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp34, DOI=}, {name=Australian Platypus Conservancy, citation=Australian Platypus Conservancy, rights=Creative Commons Attribution (International) (CC-BY 4.0 (Int)), link=For more information: https://collections.ala.org.au/public/show/dr8128, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr8128, DOI=}, null, {name=Australian Museum Malacology Collection, citation=Records provided by Australian Museum Malacology Collection, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co114, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co114, DOI=}, null, {name=OEH Atlas of NSW Wildlife, citation=BioNet Species Sightings occurrence data held by the NSW Office of Environment and Heritage (OEH).   The BioNet repository holds data from a number of sources and custodians. (Accessed through ALA Data Portal,<Date of Access>)., rights=Creative Commons Attribution (International) (CC-BY 4.0 (Int)), link=For more information: https://collections.ala.org.au/public/show/dr368, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr368, DOI=doi:10.15468/14jd9g}, {name=iNaturalist, citation=iNaturalist.org: iNaturalist Research-grade Observations. doi:10.15468/ab3s5x
Accessed via http://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7 on 2017-03-16, rights=CC-BY-NC-Int, link=For more information: https://collections.ala.org.au/public/show/dr1411, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr1411, DOI=}, {name=Australian Museum provider for OZCAM, citation=Australian Museum, http://australianmuseum.net.au/, rights=CC-BY, link=For more information: https://collections.ala.org.au/public/show/dr340, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr340, DOI=10.15468/e7susi}, {name=eBird Australia, citation=eBird Basic Dataset. Version: ebd_AU_relNov-2016. Cornell Lab of Ornithology, Ithaca, New York. November 2016.
See http://help.ebird.org/customer/portal/articles/1006835-recommended-citation?t=412380 for other citations, rights=Creative Commons Zero (CC0), link=For more information: https://collections.ala.org.au/public/show/dr2009, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr2009, DOI=}, {name=Australian National Herbarium, citation=Records provided by Australian National Herbarium, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co12, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co12, DOI=}, {name=Australian Museum Herpetology Collection, citation=Records provided by Australian Museum Herpetology Collection, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co10, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co10, DOI=}, {name=OZCAM (Online Zoological Collections of Australian Museums) Provider, citation=Records provided by OZCAM (Online Zoological Collections of Australian Museums) Provider, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp20, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp20, DOI=}, {name=National Herbarium of New South Wales, citation=Records provided by National Herbarium of New South Wales, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/co54, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=co54, DOI=}, {name=Global Biodiversity Information Facility, citation=Records provided by Global Biodiversity Information Facility, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp42, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp42, DOI=}, {name=Encyclopedia of Life Images - Flickr Group, citation=Citation is at the individual image, rights=other Creative Commons, varies with individual images, link=For more information: https://collections.ala.org.au/public/show/dr360, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr360, DOI=}, {name=New South Wales Bird Atlassers, citation=New South Wales Bird Atlassers Inc. (http://nswbirdatlassers.com/), rights=CC-BY, link=For more information: https://collections.ala.org.au/public/show/dr1089, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr1089, DOI=doi:10.15468/kfkspo}, {name=Murray-Darling Basin waterbird survey, citation=Murray–Darling Basin Authority, rights=CC-BY-Int, link=For more information: https://collections.ala.org.au/public/show/dr4731, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr4731, DOI=doi:10.15468/ke838t}, {name=Australian Museum, citation=Records provided by Australian Museum, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/in4, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=in4, DOI=}, {name=Centre for Australian National Biodiversity Research, citation=Records provided by Centre for Australian National Biodiversity Research, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/in5, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=in5, DOI=}, {name=MEL AVH data, citation=Records provided by MEL AVH data, accessed through ALA website., rights=Creative Commons Attribution (Australia) (CC-BY 3.0 (Au)) You are licensed to use, reproduce, distribute, publish or otherwise make available the Data, or to use or adapt the Data to create derivative works. 
In any reproduction, distribution or publication of the Data, you must attribute the source of the Data as Australia's Virtual Herbarium (AVH). 
In any use or adaptation of the Data to create derivative works, you must acknowledge the use of the Data in the derivative work and attribute the source of the Data as Australia's Virtual Herbarium (AVH). 
You acknowledge that the Data may contain errors and omissions and that you employ the Data at your own risk.
Neither the Council of Heads of Australasian Herbaria (CHAH) nor any other Data Custodian will accept liability for any loss, damage, cost or expenses that you may incur as a result of the use of or reliance upon the Data. 
You should use the current Data from the ALA portal and not rely on material you have previously printed or downloaded., link=For more information: https://collections.ala.org.au/public/show/dr376, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr376, DOI=doi:10.15468/rhzrxw}, {name=The Royal Botanic Gardens & Domain Trust, citation=Records provided by The Royal Botanic Gardens & Domain Trust, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/in50, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=in50, DOI=}, {name=GBIF records, citation=Citation is at record level, rights=other Rights is at record level, link=For more information: https://collections.ala.org.au/public/show/dr695, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr695, DOI=}, {name=First Bird Atlas, citation=Birds Australia - First Bird Atlas (http://www.birdsaustralia.com.au/), rights=CC-BY-NC, link=For more information: https://collections.ala.org.au/public/show/dr571, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr571, DOI=doi:10.15468/f6tipo}, {name=Historical Bird Atlas, citation=Birds Australia - Historical Bird Atlas (http://www.birdsaustralia.com.au/), rights=CC-BY-NC, link=For more information: https://collections.ala.org.au/public/show/dr570, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr570, DOI=doi:10.15468/pljute}, {name=BirdLife Australia, Birdata, citation=BirdLife Australia - Birdata Project (http://www.birdata.com.au), rights=CC-BY-NC, link=For more information: https://collections.ala.org.au/public/show/dr359, dataGeneralizations=, informationWithheld=Please note that details of records of sensitive species have been removed from this data set.  To request full access to these records please contact Birds Australia (atlas@birdlife.org.au)., downloadLimit=, uid=dr359, DOI=doi:10.15468/dchsnk}, {name=Flickr, citation=Records provided by Flickr, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp29, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp29, DOI=}, {name=Fungimap, citation=Citation is at record level: <collector as indicated in the individual records>, <survey identification>, http://fungimap.org.au/, rights=Creative Commons Attribution (International) (CC-BY 4.0 (Int)), link=For more information: https://collections.ala.org.au/public/show/dr711, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dr711, DOI=}, {name=BirdLife Australia, citation=Records provided by BirdLife Australia, accessed through ALA website., rights=, link=For more information: https://collections.ala.org.au/public/show/dp28, dataGeneralizations=, informationWithheld=, downloadLimit=, uid=dp28, DOI=}], collected stats were: {dp36=5, co113=1, dp34=6850, dr8128=1, infoHeaders,Record ID,Catalogue Number,Taxon Concept GUID,Scientific Name - original,Vernacular name - original,Scientific Name,Taxon Rank,Vernacular name,Kingdom,Phylum,Class,Order,Family,Genus,Species,Subspecies,Data Resource ID,Data Resource Name,Institution ID,Institution,Collection ID,Collection,Licence,Institution Code,Collection Code,Locality,Latitude - original,Longitude - original,Geodetic datum - original,Latitude,Longitude,Coordinate Precision,Coordinate Uncertainty in Metres,Country - parsed,State - parsed,Local Government Areas,IMCRA 4 Regions,IBRA 7 Regions,Maximum elevation in meters,Minimum elevation in meters,Minimum depth in meters,Maximum depth in meters,Individual count,Collector,Year,Month,Day,Event Date - parsed,Verbatim event date,Basis Of Record - original,Basis Of Record,Occurrence status,Sex,Preparations,Outlier for layer,Taxonomic Quality,Location Quality=-2, co114=5, infoFields,id,catalogue_number,taxon_concept_lsid,raw_taxon_name,raw_common_name,taxon_name,rank,common_name,kingdom,phylum,class,order,family,genus,species,subspecies,data_resource_uid,data_resource,institution_uid,institution_name,collection_uid,collection_name,license,institution_code,collection_code,raw_locality,raw_latitude,raw_longitude,raw_datum,latitude,longitude,coordinate_precision,coordinate_uncertainty,country,state,cl959,cl21,cl1048,min_elevation_d,max_elevation_d,min_depth_d,max_depth_d,individual_count,collector,year,month,day,occurrence_date,verbatim_event_date,raw_basis_of_record,basis_of_record,occurrence_status,raw_sex,preparations,outlier_layer,taxonomic_kosher,geospatial_kosher=-1, dr368=6850, dr1411=1, dr340=8, dr2009=7443, co12=1, co10=2, dp20=8, co54=4, dp42=73, dr360=4, dr1089=71, dr4731=10, in4=8, in5=1, dr376=5, in50=4, dr695=73, dr571=427, dr570=182, dr359=2974, dp29=4, dr711=2, dp28=3583}
2018-10-04 11:42:25,300 [biocachedownload-pool-500000-RECORDS_INDEX-0] INFO au.org.ala.biocache.service.AuthService  (AuthService.java:199) - authCache requesting: https://auth.ala.org.au/userdetails/userDetails/getUserDetails?userName=REDACTED@REDACTED
2018-10-04 11:42:34,873 [biocachedownload-pool-500000-RECORDS_INDEX-0] DEBUG au.org.ala.biocache.service.DownloadService  (DownloadService.java:650) - DOI minted: 10.26197/5bb570046754e
2018-10-04 11:42:58,537 [AsyncAppender-Dispatcher-Thread-4] WARN org.apache.commons.httpclient.HttpMethodBase  (HttpMethodBase.java:682) - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
2018-10-04 11:43:38,583 [biocachedownload-pool-500000-RECORDS_INDEX-0] DEBUG au.org.ala.biocache.service.EmailService  (EmailService.java:64) - Send email to : REDACTED@REDACTED
2018-10-04 11:43:38,642 [biocachedownload-pool-500000-RECORDS_INDEX-0] DEBUG au.org.ala.biocache.dao.JsonPersistentQueueDAOImpl  (JsonPersistentQueueDAOImpl.java:231) - Removing the download from the queue
2018-10-04 11:43:38,642 [biocachedownload-pool-500000-RECORDS_INDEX-0] INFO au.org.ala.biocache.dao.JsonPersistentQueueDAOImpl  (JsonPersistentQueueDAOImpl.java:236) - Deleting /data/cache/downloads/offline1538617326351.json true

The shapefile for this download was successfully created and looks like it is intact.

ansell commented 5 years ago

The user has also sent an email to ALA support, tracked in helpdesk as ticket 23861

ansell commented 5 years ago

Need to switch on solr-b4 when I get a chance to verify if this is a new regression or whether it failed previous also.

ansell commented 5 years ago

The original query was:

/ws/occurrences/offline/download?hubName=Atlas+of+Living+Australia&file=Wa_bicolor&mintDoi=true&reasonTypeId=7&searchUrl=https%3A%2F%2Fbiocache.ala.org.au%2Foccurrences%2Fsearch%3Fq%3Dlsid%253Aurn%253Alsid%253Abiodiversity.org.au%253Aafd.taxon%253Ab6f829dd-0aef-4422-a3f6-77ff691aa9af&fileType=shapefile&qa=none&sourceTypeId=0&email=REDACTED%40REDACTED&doiDisplayUrl=https%3A%2F%2Fbiocache.ala.org.au%2Fdownload%2Fdoi%3Fdoi%3D&q=lsid%3Aurn%3Alsid%3Abiodiversity.org.au%3Aafd.taxon%3Ab6f829dd-0aef-4422-a3f6-77ff691aa9af
ansell commented 5 years ago

Download completed successfully on old infrastructure (and then transferred to new infrastructure for archival)

https://biocache.ala.org.au/biocache-download/ad425ee9-ad26-309f-80ec-bf0a27da52de/1539043109481/Wa_bicolor.zip
ansell commented 5 years ago

Successfully downloaded it today. Possible cause may be a timeout needs to be extended, given the large dump of data to disk for shapefile downloads is done once they are complete, not in a streaming manner over time.