Closed tucotuco closed 9 years ago
For the 4th item in my previous comment, look at my comment on #116. I think the archive reader problem is actually hitting a case that's not supported by the GBIF DWCA reader.
For the fifth item, GBIF has made the update.
Addressed in http://dev.gbif.org/issues/browse/POR-2395
https://github.com/gbif/dwca-reader/commit/903d10236b3b2cda46a0d2b3e994e0ce658c328e
Awesome! Have they released a new snapshot?
I believe the new snapshot is at http://repository.gbif.org/content/repositories/snapshots/org/gbif/dwca-reader/1.19-SNAPSHOT/
Branch develop set to use
https://clojars.org/dwca-reader-clj/versions/0.10.1-SNAPSHOT
which in turn uses
http://repository.gbif.org/content/repositories/snapshots/org/gbif/dwca-reader/1.20-SNAPSHOT/. This SNAPSHOT solves the issue of missing Dublin Core fields, introduces the new Darwin Core changes as of 2014-10-30 (see http://rs.tdwg.org/dwc/terms/history/decisions/index.htm; add Organism terms and deprecates the Dublin Core term "rights" in favor of "license").
Branch dwc2013 created to be able to easily do the old-style harvest used in the portal indexing as of 2014-12-22 using
https://clojars.org/dwca-reader-clj/versions/0.8.0-SNAPSHOT
which in turn used
http://repository.gbif.org/content/repositories/snapshots/org/gbif/dwca-reader/1.9.1-SNAPSHOT/.
So, in gulo we take advantage of dwca-reader wrapper (https://github.com/VertNet/dwca-reader-clj/blob/develop/src/clj/dwca/core.clj) to the GBIF Darwin Core Reader.
The goal is to get gulo using the latest DwC-A code base and to get it to use the correct openArchive method from that code (see https://github.com/VertNet/gulo/issues/116). Specifically, need to:
1) Do a quick code walk through for core.clj 2) Understand how the Java is invoked 3) Update gulo to use the reader from GBIF (https://github.com/gbif/dwca-reader/) 4) Make sure the method for reading the archive is the one passing a temp directory to work in rather than the one with the archive as a single argument 5) Assure that harvest is generating Simple Darwin Core plus the info from the CartoDB resource table.