apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.32k stars 3.66k forks source link

druid-0.9.0 release notes #2404

Closed fjy closed 8 years ago

fjy commented 8 years ago

This is a tracking issue for the upcoming druid-0.9.0.

Release notes:

Druid 0.9.0 introduces an update to the extension system that requires configuration changes. There were additionally over 300 pull requests from 0.8.3 to 0.9.0. Below we highlight the more important changes in this patch.

Full list of changes is here: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed

Updating from 0.8.x

Extensions

In Druid 0.9, we have refactored the extension loading mechanism. The main reason behind this change is to make Druid load extensions from the local file system without having to download stuff from the internet at runtime

To learn all about the new extension loading mechanism, see Include extensions and Include Hadoop Dependencies. If you are impatient, here is the summary.

The following properties have been deprecated: druid.extensions.coordinates druid.extensions.remoteRepositories druid.extensions.localRepository druid.extensions.defaultVersion

Instead, specify druid.extensions.loadList, druid.extensions.directory and druid.extensions.hadoopDependenciesDir.

druid.extensions.loadList specifies the list of extensions that will be loaded by Druid at runtime. An example would be druid.extensions.loadList=["druid-datasketches", "mysql-metadata-storage"].

druid.extensions.directory specifies the directory where all the extensions live. An example would be druid.extensions.directory=/xxx/extensions.

druid.extensions.hadoopDependenciesDir specifies the directory where all the Hadoop dependencies live. An example would be druid.extensions.hadoopDependenciesDir=/xxx/hadoop-dependencies. Note: We didn't change the way of specifying which Hadoop version to use. So you just need to make sure the Hadoop you want to use exists underneath /xxx/hadoop-dependencies.

You might now wonder if you have to manually put extensions inside /xxx/extensions and /xxx/hadoop-dependencies. The answer is no, we already have created them for you. Download the latest Druid tarball at http://druid.io/downloads.html. Unpack it and you will see extensions and hadoop-dependencies folders there. Simply copy them to /xxx/extensions and /xxx/hadoop-dependencies respectively, now you are all set!

If the extension or the Hadoop dependency you want to load is not included in the core extension, you can use pull-deps to download it to your extension directory.

If you want to load your own extension, you can first do mvn install to install it into local repository, and then use pull-deps to download it to your extension directory.

Please feel free to leave any questions regarding the migration.

Extensions have now also been refactored in core and contrib extensions. Core extensions will be maintained by Druid committers and are packaged as part of the download tarball. Contrib extensions are community maintained and can be installed as needed. For more information, please see here.

Ordering of Dimensions

Until Druid 0.8.x the order of dimensions given at indexing time did not affect the way data gets indexed. Rows would be ordered first by timestamp, then by dimension values, in lexicographical order of dimension names.

As of Druid 0.9.0, Druid respects the given dimension order given and will order rows first by timestamp, then by dimension values, in the given dimension order.

This means segments may now vary in size depending on the order in which dimensions are given. Specifying a dimension with many unique values first, may result in worse compression than specifying dimensions with repeating values first.

Min/Max Aggregators no longer supported, use doubleMin/doubleMax instead

As indicated in the 0.8.3 release notes, min/max aggregators have been removed in favor of doubleMin, doubleMax, longMin, and longMax aggregators.

If you have any issues starting up because of this, please see https://github.com/druid-io/druid/issues/2749

Configuration changes

druid.indexer.task.baseDir and druid.indexer.task.baseTaskDir now default to using the standard Java temporary directory specified by java.io.tmpdir system property, instead of /tmp,

Other issues to be aware of: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3A%22Release+Notes%22

and

https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AIncompatible

New Features

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AFeature

1719 Add Rackspace Cloud Files Deep Storage Extension

1858 Support avro ingestion for realtime & hadoop batch indexing

1873 add ability to express CONCAT as an extractionFn

1921 Add docs and benchmark for JSON flattening parser

1936 adding Upper/Lower Bound Filter

1978 Graphite emitter

1986 Preserve dimension order across indexes during ingestion

2008 Regex search query

2014 Support descending time ordering for time series query

2043 Add dimension selector support for groupby/having filter

2076 adding lower and upper extraction fn

2209 support cascade execution of extraction filters in extraction dimension spec

2221 Allow change minTopNThreshold per topN query

2264 Adding custom mapper for json processing exception

2271 time-descending result of select queries

2258 acl for zookeeper is added

Improvements

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AImprovement

984 Use thread priorities. (aka set nice values for background-like tasks)

1638 Remove Maven client at runtime + Provide a way to load Druid extensions through local file system

1728 Store AggregatorFactory[] in segment metadata

1988 support multiple intervals in dataSource inputSpec

2006 Preserve dimension order across indexes during ingestion

2047 optimize InputRowSerde

2075 Configurable value replacement on match failure for RegexExtractionFn

2079 reduce bytearray copy to minimal optimize VSizeIndexedWriter

2084 minor optimize IndexMerger's MMappedIndexRowIterable

2094 Simplifying dimension merging

2107 More efficient SegmentMetadataQuery

2111 optimize create inverted indexes

2138 build v9 directly

2228 Improve heap usage for IncrementalIndex

2261 Prioritize loading of segments based on segment interval

2306 More specific null/empty str handling in IndexMerger

Bug Fixes

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ABug

Documentation

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ADocumentation

2100 doc update to make it easy to find how to do re-indexing or delta ingestion

2186 Add intro developer docs

2279 Some more multitenancy docs

2364 Add more docs around timezone handling

2216 Completely rework the Druid getting started process

Thanks to everyone who contributed to this patch! @fjy @xvrl @drcrallen @pjain1 @chtefi @liubin @salsakran @jaebinyo @erikdubbelboer @gianm @bjozet @navis @AlexanderSaydakov @himanshug @guobingkun @abbondanza @binlijin @rasahner @jon-wei @CHOIJAEHONG1 @loganlinn @michaelschiff @himank @nishantmonu51 @sirpkt @duilio @pdeva @KurtYoung @mangesh-pardeshi @dclim @desaianuj @stevemns @b-slim @cheddar @jkukul @AdrieanKhisbe @liuqiyun @codingwhatever @clintropolis @zhxiaogg @rohitkochar @itsmee @Angelmmiguel @noddi @se7entyse7en @zhaown @genevien

gianm commented 8 years ago

finalized in https://github.com/druid-io/druid/releases/tag/druid-0.9.0