elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
1.93k stars 989 forks source link

Policy about type name in index name is too harsh #2188

Open codefromthecrypt opened 9 months ago

codefromthecrypt commented 9 months ago

What kind an issue is this?

Issue description

I can't find the pull request discussion about it, but this commit made it illegal to have a type name inside an index name:

https://github.com/elastic/elasticsearch-hadoop/commit/047d80bc8fb25fdeef059078f4c225d4fe3041bf#diff-082a1d730a4b541ff64014349ca4c0ce987cf6a5a9f92a6c4dd2fc2325348cceR86

Such a strict naming convention breaks upgrade paths, including zipkin as well this other report in the user group.

While understandable that type names in index names aren't useful in modern versions, it shouldn't break people or prevent them from upgrading. There are ecosystem tools that expect a common naming convention for the data in indices and it is very large effort to change over one draconian source line.

cc @xeraa as this is a huge impact and if this is declined I think the ecosystem will never upgrade.

Steps to reproduce

Code:

Pending code trying to upgrade zipkin-dependencies to support ES 8 and not interfere with ES 7 which already works. For example, this policy makes it impossible to upgrade cleanly.

Stack trace:

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Detected type name in resource [earch.itelasticsearchdependencies$itdependencies:span-2023-12-30/span]. Remove type name to continue.

    at org.elasticsearch.hadoop.rest.Resource.<init>(Resource.java:88)
    at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexNameForRead(InitializationUtils.java:61)

Version Info

OS: : Darwin (for testing) JVM : azul-11.0.21 Hadoop/Spark: 3.3.4 ES-Hadoop : 8.11.3 ES : 7.x or 8.x

codefromthecrypt commented 9 months ago

I removed ES 6 support from our tool, and will have to mention loudly in the release notes that someone needs to first upgrade from ES 6-7 before using the es-hadoop version that includes this issue, as it is no longer possible to make the migration path from 6-8 in the same binary.

Feel free to close this if you are ok with the impact

masseyke commented 9 months ago

Can you provide more details on how to reproduce this? You are using es-hadoop 8.11.3 pointed at a 6.x elasticsearch with types? For some reason the es-hadoop client thinks that you are pointed at an 8.x elasticsearch (it discovers the version on startup). This is the logic that is supposed to handle it: https://github.com/elastic/elasticsearch-hadoop/blob/v8.11.3/mr/src/main/java/org/elasticsearch/hadoop/rest/Resource.java#L85. Or are you trying to use an index name with types when reading from elasticsearch 8.x?

xeraa commented 9 months ago

Maybe one sidenote for the upgrade story: Apache Lucene only writes the current major version (N) and can only read the previous major version (N-1) or current one (N). The Elasticsearch major versions also upgrade to a major Lucene version. So going from 6 to 8 will generally not work, since you'd need to be able to read N-2 — at least not without a reindex in which you could fix the _type.

It might not be perfect but many people will probably only keep tracing data for some time (let's say 30 to 90 days). If possible, I'd do a stepwise upgrade from 6 to 7 or within 7 to the new pattern and then 8 as the data ages out. Not perfect but maybe a reasonable tradeoff in upgrades?