Esri / geoportal-server-harvester

Metadata Harvester for Esri Geoportal Server
http://esri.github.io/geoportal-server/
Apache License 2.0
31 stars 25 forks source link

Feat/sdsc/folder big #76

Closed valentinedwv closed 6 years ago

valentinedwv commented 6 years ago

Split into smaller folders when harvest is large

pandzel-zz commented 6 years ago

David, I need a little bit of the information regarding the context of this pull request, like:

Looking forward for your input.

valentinedwv commented 6 years ago

If a harvest has 10's to 100'k records, storing in a single folder is a large performance hit, aka slow performance to list all files.

Limiting # files in folder to < 10k solves this issue.

hold on, not fully working.

valentinedwv commented 6 years ago

Sanitized file names

valentinedwv commented 6 years ago

ready

pandzel-zz commented 6 years ago

Looks like file LargeDataSetDirectoryAssigner class is/was part of another project authored by bozyurt on 12/1/15. Is there a reference to that project? What is the license for that code?

valentinedwv commented 6 years ago

Is part of our processing pipeline: https://github.com/CINERGI/Foundry/blob/master/LICENSE.md

zguo commented 6 years ago

Thanks for the contribution, will it be possible to provide it under apache 2.0 license or BSD or MIT? As the product will be used by commercial users as well, the following language in the license would be a concern:

Permission to make commercial use of this software may be obtained by contacting: Technology Transfer Office 9500 Gilman Drive, Mail Code 0910 University of California La Jolla, CA 92093-0910 (858) 534-5815invent@ucsd.edu

Thanks,

valentinedwv commented 6 years ago

Apache 2 is fine

zguo commented 6 years ago

great, thanks!