Open keith-turner opened 2 years ago
This issue was opened based on discussion on #2492
For table and namespace IDs, this would somewhat conflict with the goal to keep toString()
equivalent to canonical()
in the API contract for AbstractId
.
However, I had been thinking about a proper Namespace
and Table
object, which would be a pair of (name, ID)
, which could be used internally, and provide a similar result via its toString()
.
For table and namespace IDs, this would somewhat conflict with the goal to keep toString() equivalent to canonical() in the API contract for AbstractId.
I was not really sure of the best way to implement this, was not sure about changing toString() on existing id type. I like toString() being the same as canonical() personally, because that is nice when someone adds their own context. We could have String toLog()
method or something similar that is short and can be used when logging the id.
We could have
String toLog()
method or something similar that is short and can be used when logging the id.
The problem with using something other than toString()
, is that we have to remember to explicitly convert it ourselves, and we frequently forget to do that. A log builder or some other wrapper around logging could force us to do it, though (because its API could require some kind of Loggable
object instead of accepting strings).
The problem with using something other than toString(), is that we have to remember to explicitly convert it ourselves, and we frequently forget to do that. A log builder or some other wrapper around logging could force us to do it, though (because its API could require some kind of Loggable object instead of accepting strings).
Looked around for ways to have custom formatting for objects in log4j and found this :
ran the following test
package org.apache.accumulo.test;
import org.apache.logging.log4j.util.StringBuilderFormattable;
import org.slf4j.LoggerFactory;
public class Test {
public static class Foo implements StringBuilderFormattable {
@Override public void formatTo(StringBuilder buffer) {
buffer.append("TABLE_ID[99]");
}
}
@org.junit.Test
public void test1(){
var log = LoggerFactory.getLogger("TL1");
log.info("This is only a test: {}", new Foo());
}
}
which printed
2022-02-14T20:20:35,625 [TL1] INFO : This is only a test: TABLE_ID[99]
So maybe the abstract id type could implement this interface for logging the id.
2022-02-14T20:20:35,625 [TL1] INFO : This is only a test: TABLE_ID[99]
Using slf4j-simple
instead of log4j-slf4j-impl
, this prints:
[main] INFO TL1 - This is only a test: Test$Foo@545997b1
So, it only works with the log4j-slf4j-impl
at runtime, and not other slf4j implementations.
So maybe the abstract id type could implement this interface for logging the id.
That would require having a hard dependency on log4j again in code and leaks this 3rd party library class into the class hierarchy of our public API, both of which have caused us problems historically and we have previously moved to resolve.
Given the API leakage, the tight coupling with log4j, and the fact that it won't work with any other slf4j binding, I don't think this solution is viable.
We could just create our own interface like StringBuilderFormattable
, call it something like Loggable
. Then make whatever objects we want to have a log friendly toString()
implement it.
With the concrete types we have implemented in TableId
and NamespaceId
, I don't think it is a big of deal anymore to have the toString()
method return a formatted string. I think most places it is used is in string concatenation and logging. The other places where want a canonical()
ID it is also pretty clear.
Not sure if it would matter - but ids are effectively (if not outright) immutable - the output string could be created on instantiation of the id so that concatenation / formatting would not need to occur on every log statement - the toString would just return a fixed string of the instance.
I don't think it is a big of deal anymore to have the toString() method return a formatted string. I think most places it is used is in string concatenation and logging.
If the majority use of toString is for logging then maybe the simplest thing is to make toString format its return. There may be user facing outputs from utilities/shell that we may want to handle differently, if those are a minority then maybe they could just call canonical().
So far it seems like there are three viable options (with the log4j specific option not being viable)
I think having toString different from canonical is error prone. People will make mistakes, and not realize... whether it's concatenating a path in ZK, constructing a metadata entry, or just an innocuous log statement, mistakes will occur, and they will not be easy to detect, because toString is not explicit in string concatenation, it's merely implied.
However, ultimately, I don't think it's necessary to bother with trying to format the toString for table IDs and namespace IDs. Most of the time when we want to log, we want to log the table / namespace, not just the table ID / namespace ID. I would prefer to have a concrete type for Table and Namespace, whose toString is a formatted representation of both the name and its ID. I'd much rather see something like deleting TABLE[NAME=mytable;ID=2c]
than deleting TABLEID(2c)
.
People will make mistakes, and not realize... whether it's concatenating a path in ZK, constructing a metadata entry
That is a really good con. I edited my earlier comment to add it.
However, ultimately, I don't think it's necessary to bother with trying to format the toString for table IDs and namespace IDs. Most of the time when we want to log, we want to log the table / namespace, not just the table ID / namespace ID. I would prefer to have a concrete type for Table and Namespace, whose toString is a formatted representation of both the name and its ID. I'd much rather see something like deleting TABLE[NAME=mytable;ID=2c] than deleting TABLEID(2c).
That would be really nice. I would love to see something like TABLE[NAME=mytable;ID=2c]
consistently in the logs. Is there an issue for that? If we open an issue for that, then we could probably close this. I don't think there is anything general to do. If we create a new FateTxID type then how it logs can be tailored specifically for it, but there is nothing more general to do in the code base for other id types.
I'd have to mine through tickets to see if there's a current issue for it. I know it was a part of the set of ideas I had for the 2.0 API, to have a concrete Table and Namespace object, as well as part of the recent work I had been looking into about standardizing how we pass table information over RPCs (as name or id). I think Table and Namespace formatting would be covered by that, but this issue might still apply to other kinds of identifiers.
While working on https://github.com/apache/accumulo/pull/2780, I realized we need a concrete ID for Fate as well. It would be nice to have if we are going to release an API of fate operations. This is what I was thinking:
/** * A strongly typed representation of a Fate Transaction ID. There are two representations of a * FateTxId. The first is a hexadecimal long. The second is a formatted String. * * @since 2.1.0 */ public class FateTxId extends AbstractId{ private FateTxId(long hexadecimal) { super(FateTxIdUtil.formatTid(hexadecimal)); } /** * Get a FateTxId object for the provided hexadecimal number. This is guaranteed to be non-null. * * @param hexadecimal * Fate transaction ID as a hexadecimal long * @return FateTxId object */ public static FateTxId of(final long hexadecimal) { try { return cache.get(hexadecimal, () -> new FateTxId(hexadecimal)); } catch (ExecutionException e) { throw new AssertionError( "This should never happen: ID constructor should never return null."); } } /** * Get a FateTxId object for the provided formatted String. This is guaranteed to be non-null. * * @param formattedString * Fate transaction ID as a formatted String * @return FateTxId object */ public static FateTxId of(String formattedString) { if (!FateTxIdUtil.isFormatedTid(formattedString)) { throw new IllegalArgumentException("The string provided (" + formattedString + ") is not formatted properly. "); } return new FateTxId(FateTxIdUtil.fromString(formattedString)); } }
Accumulo has multiple different id types like table ids, FATE transaction ids, namespace ids, external compactions ids, scan session ids, etc. These ids are logged. For logging fate transaction ids an attempt was made to logs these very consistently by always calling this function when logging them. This method consistently formats the id and includes a type. The type is nice for two reasons, first it make the ID easily distinguishable when scanning logs. Second its nice to be able to grep a log just for the TYPE and see all log messages that have a FATE transaction id in them.
This concept could be expanded to all ID types in Accumulo. We could consistently format type and id for all ids in Accumulo. We could logs ids like
FATE[123]
,TABLE_ID[456]
,ECID[abd]
,NAMESPACE_ID[xyz]
, etc in the logs. SeeingTABLE_ID[456]
in a log message is much better than seeing456
in the log, especially for logs messages where its not clear what456
is.