Closed reupost closed 5 years ago
@reupost thanks! I will test and integrate this today
Thanks @ansell, sorry about the missed "openArchive" reference, it was my clumsiness with git that meant I was making these changes manually and I must have missed that one. One thing I've found with further testing on our side, is the new version seems to be a little fussier with character encodings than the old one. It works fine with UTF-8, but fails on one of our archives containing an occurrences.csv file which is 'Western (Mac OS Roman)'-encoded. The older build handles that file without issue. Possibly there are other encodings that are no longer tolerated. I'll do some more investigating.
I fixed the nexus.ala.org.au
to repository.gbif.org
linkage and the Travis build succeeded after that.
If you find that the encoding issue is a biocache-store
bug create an issue here, otherwise possibly a dwca-io
issue. I worked on the new dwca-io
routines, to use jackson-csv
, but I don't remember doing encoding tests at the time.
From Matt @ GBIF: "It might be worth warning ALA that we don't have a single dataset that isn't UTF-8 (!), so we might not have found many issues around encodings. [also] dwca-io only accepts Java charset names, which in this case are different (MacRoman vs. Macintosh) from the IANA names."
This fixes the issue where linebreaks within fields in DWCA's were being interpreted as the start of new records.