This is a tracking issue for the upcoming druid-0.9.0.
Release notes:
Druid 0.9.0 introduces an update to the extension system that requires configuration changes. There were additionally over 300 pull requests from 0.8.3 to 0.9.0. Below we highlight the more important changes in this patch.
In Druid 0.9, we have refactored the extension loading mechanism. The main reason behind this change is to make Druid load extensions from the local file system without having to download stuff from the internet at runtime
The following properties have been deprecated:
druid.extensions.coordinatesdruid.extensions.remoteRepositoriesdruid.extensions.localRepositorydruid.extensions.defaultVersion
Instead, specify druid.extensions.loadList, druid.extensions.directory and druid.extensions.hadoopDependenciesDir.
druid.extensions.loadList specifies the list of extensions that will be loaded by Druid at runtime. An example would be druid.extensions.loadList=["druid-datasketches", "mysql-metadata-storage"].
druid.extensions.directory specifies the directory where all the extensions live. An example would be druid.extensions.directory=/xxx/extensions.
druid.extensions.hadoopDependenciesDir specifies the directory where all the Hadoop dependencies live. An example would be druid.extensions.hadoopDependenciesDir=/xxx/hadoop-dependencies. Note: We didn't change the way of specifying which Hadoop version to use. So you just need to make sure the Hadoop you want to use exists underneath /xxx/hadoop-dependencies.
You might now wonder if you have to manually put extensions inside /xxx/extensions and /xxx/hadoop-dependencies. The answer is no, we already have created them for you. Download the latest Druid tarball at http://druid.io/downloads.html. Unpack it and you will see extensions and hadoop-dependencies folders there. Simply copy them to /xxx/extensions and /xxx/hadoop-dependencies respectively, now you are all set!
If the extension or the Hadoop dependency you want to load is not included in the core extension, you can use pull-deps to download it to your extension directory.
If you want to load your own extension, you can first do mvn install to install it into local repository, and then use pull-deps to download it to your extension directory.
Please feel free to leave any questions regarding the migration.
Extensions have now also been refactored in core and contrib extensions. Core extensions will be maintained by Druid committers and are packaged as part of the download tarball. Contrib extensions are community maintained and can be installed as needed. For more information, please see here.
Ordering of Dimensions
Until Druid 0.8.x the order of dimensions given at indexing time did not affect the way data gets indexed. Rows would be ordered first by timestamp, then by dimension values, in lexicographical order of dimension names.
As of Druid 0.9.0, Druid respects the given dimension order given and will order rows first by timestamp, then by dimension values, in the given dimension order.
This means segments may now vary in size depending on the order in which dimensions are given. Specifying a dimension with many unique values first, may result in worse compression than specifying dimensions with repeating values first.
Min/Max Aggregators no longer supported, use doubleMin/doubleMax instead
As indicated in the 0.8.3 release notes, min/max aggregators have been removed in favor of doubleMin, doubleMax, longMin, and longMax aggregators.
druid.indexer.task.baseDir and druid.indexer.task.baseTaskDir now default to using the standard Java temporary directory specified by java.io.tmpdir system property, instead of /tmp,
This is a tracking issue for the upcoming druid-0.9.0.
Release notes:
Druid 0.9.0 introduces an update to the extension system that requires configuration changes. There were additionally over 300 pull requests from 0.8.3 to 0.9.0. Below we highlight the more important changes in this patch.
Full list of changes is here: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed
Updating from 0.8.x
Extensions
In Druid 0.9, we have refactored the extension loading mechanism. The main reason behind this change is to make Druid load extensions from the local file system without having to download stuff from the internet at runtime
To learn all about the new extension loading mechanism, see Include extensions and Include Hadoop Dependencies. If you are impatient, here is the summary.
The following properties have been deprecated:
druid.extensions.coordinates
druid.extensions.remoteRepositories
druid.extensions.localRepository
druid.extensions.defaultVersion
Instead, specify druid.extensions.loadList, druid.extensions.directory and druid.extensions.hadoopDependenciesDir.
druid.extensions.loadList specifies the list of extensions that will be loaded by Druid at runtime. An example would be
druid.extensions.loadList=["druid-datasketches", "mysql-metadata-storage"]
.druid.extensions.directory specifies the directory where all the extensions live. An example would be
druid.extensions.directory=/xxx/extensions
.druid.extensions.hadoopDependenciesDir specifies the directory where all the Hadoop dependencies live. An example would be
druid.extensions.hadoopDependenciesDir=/xxx/hadoop-dependencies
. Note: We didn't change the way of specifying which Hadoop version to use. So you just need to make sure the Hadoop you want to use exists underneath/xxx/hadoop-dependencies
.You might now wonder if you have to manually put extensions inside
/xxx/extensions
and/xxx/hadoop-dependencies
. The answer is no, we already have created them for you. Download the latest Druid tarball at http://druid.io/downloads.html. Unpack it and you will seeextensions
andhadoop-dependencies
folders there. Simply copy them to/xxx/extensions
and/xxx/hadoop-dependencies
respectively, now you are all set!If the extension or the Hadoop dependency you want to load is not included in the core extension, you can use pull-deps to download it to your extension directory.
If you want to load your own extension, you can first do mvn install to install it into local repository, and then use pull-deps to download it to your extension directory.
Please feel free to leave any questions regarding the migration.
Extensions have now also been refactored in core and contrib extensions. Core extensions will be maintained by Druid committers and are packaged as part of the download tarball. Contrib extensions are community maintained and can be installed as needed. For more information, please see here.
Ordering of Dimensions
Until Druid 0.8.x the order of dimensions given at indexing time did not affect the way data gets indexed. Rows would be ordered first by timestamp, then by dimension values, in lexicographical order of dimension names.
As of Druid 0.9.0, Druid respects the given dimension order given and will order rows first by timestamp, then by dimension values, in the given dimension order.
This means segments may now vary in size depending on the order in which dimensions are given. Specifying a dimension with many unique values first, may result in worse compression than specifying dimensions with repeating values first.
Min/Max Aggregators no longer supported, use doubleMin/doubleMax instead
As indicated in the 0.8.3 release notes, min/max aggregators have been removed in favor of doubleMin, doubleMax, longMin, and longMax aggregators.
If you have any issues starting up because of this, please see https://github.com/druid-io/druid/issues/2749
Configuration changes
druid.indexer.task.baseDir
anddruid.indexer.task.baseTaskDir
now default to using the standard Java temporary directory specified byjava.io.tmpdir
system property, instead of/tmp
,Other issues to be aware of: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3A%22Release+Notes%22
and
https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AIncompatible
New Features
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AFeature
1719 Add Rackspace Cloud Files Deep Storage Extension
1858 Support avro ingestion for realtime & hadoop batch indexing
1873 add ability to express CONCAT as an extractionFn
1921 Add docs and benchmark for JSON flattening parser
1936 adding Upper/Lower Bound Filter
1978 Graphite emitter
1986 Preserve dimension order across indexes during ingestion
2008 Regex search query
2014 Support descending time ordering for time series query
2043 Add dimension selector support for groupby/having filter
2076 adding lower and upper extraction fn
2209 support cascade execution of extraction filters in extraction dimension spec
2221 Allow change minTopNThreshold per topN query
2264 Adding custom mapper for json processing exception
2271 time-descending result of select queries
2258 acl for zookeeper is added
Improvements
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AImprovement
984 Use thread priorities. (aka set
nice
values for background-like tasks)1638 Remove Maven client at runtime + Provide a way to load Druid extensions through local file system
1728 Store AggregatorFactory[] in segment metadata
1988 support multiple intervals in dataSource inputSpec
2006 Preserve dimension order across indexes during ingestion
2047 optimize InputRowSerde
2075 Configurable value replacement on match failure for RegexExtractionFn
2079 reduce bytearray copy to minimal optimize VSizeIndexedWriter
2084 minor optimize IndexMerger's MMappedIndexRowIterable
2094 Simplifying dimension merging
2107 More efficient SegmentMetadataQuery
2111 optimize create inverted indexes
2138 build v9 directly
2228 Improve heap usage for IncrementalIndex
2261 Prioritize loading of segments based on segment interval
2306 More specific null/empty str handling in IndexMerger
Bug Fixes
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ABug
Documentation
Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ADocumentation
2100 doc update to make it easy to find how to do re-indexing or delta ingestion
2186 Add intro developer docs
2279 Some more multitenancy docs
2364 Add more docs around timezone handling
2216 Completely rework the Druid getting started process
Thanks to everyone who contributed to this patch! @fjy @xvrl @drcrallen @pjain1 @chtefi @liubin @salsakran @jaebinyo @erikdubbelboer @gianm @bjozet @navis @AlexanderSaydakov @himanshug @guobingkun @abbondanza @binlijin @rasahner @jon-wei @CHOIJAEHONG1 @loganlinn @michaelschiff @himank @nishantmonu51 @sirpkt @duilio @pdeva @KurtYoung @mangesh-pardeshi @dclim @desaianuj @stevemns @b-slim @cheddar @jkukul @AdrieanKhisbe @liuqiyun @codingwhatever @clintropolis @zhxiaogg @rohitkochar @itsmee @Angelmmiguel @noddi @se7entyse7en @zhaown @genevien