Explore consistently formatting all ids in Accumulo logs.

keith-turner commented 2 years ago

Accumulo has multiple different id types like table ids, FATE transaction ids, namespace ids, external compactions ids, scan session ids, etc. These ids are logged. For logging fate transaction ids an attempt was made to logs these very consistently by always calling this function when logging them. This method consistently formats the id and includes a type. The type is nice for two reasons, first it make the ID easily distinguishable when scanning logs. Second its nice to be able to grep a log just for the TYPE and see all log messages that have a FATE transaction id in them.

This concept could be expanded to all ID types in Accumulo. We could consistently format type and id for all ids in Accumulo. We could logs ids like FATE[123], TABLE_ID[456], ECID[abd], NAMESPACE_ID[xyz], etc in the logs. Seeing TABLE_ID[456] in a log message is much better than seeing 456 in the log, especially for logs messages where its not clear what 456 is.

keith-turner commented 2 years ago

This issue was opened based on discussion on #2492

ctubbsii commented 2 years ago

For table and namespace IDs, this would somewhat conflict with the goal to keep toString() equivalent to canonical() in the API contract for AbstractId.

However, I had been thinking about a proper Namespace and Table object, which would be a pair of (name, ID), which could be used internally, and provide a similar result via its toString().

keith-turner commented 2 years ago

For table and namespace IDs, this would somewhat conflict with the goal to keep toString() equivalent to canonical() in the API contract for AbstractId.

I was not really sure of the best way to implement this, was not sure about changing toString() on existing id type. I like toString() being the same as canonical() personally, because that is nice when someone adds their own context. We could have String toLog() method or something similar that is short and can be used when logging the id.

ctubbsii commented 2 years ago

We could have String toLog() method or something similar that is short and can be used when logging the id.

The problem with using something other than toString(), is that we have to remember to explicitly convert it ourselves, and we frequently forget to do that. A log builder or some other wrapper around logging could force us to do it, though (because its API could require some kind of Loggable object instead of accepting strings).

keith-turner commented 2 years ago

The problem with using something other than toString(), is that we have to remember to explicitly convert it ourselves, and we frequently forget to do that. A log builder or some other wrapper around logging could force us to do it, though (because its API could require some kind of Loggable object instead of accepting strings).

Looked around for ways to have custom formatting for objects in log4j and found this :

https://logging.apache.org/log4j/log4j-2.17.0/log4j-api/apidocs/org/apache/logging/log4j/util/StringBuilderFormattable.html

ran the following test

package org.apache.accumulo.test;

import org.apache.logging.log4j.util.StringBuilderFormattable;
import org.slf4j.LoggerFactory;

public class Test {
  public static class Foo implements StringBuilderFormattable {
    @Override public void formatTo(StringBuilder buffer) {
      buffer.append("TABLE_ID[99]");
    }
  }

  @org.junit.Test
  public void test1(){
    var log = LoggerFactory.getLogger("TL1");
    log.info("This is only a test: {}", new Foo());
  }
}

which printed

2022-02-14T20:20:35,625 [TL1] INFO : This is only a test: TABLE_ID[99]

So maybe the abstract id type could implement this interface for logging the id.

ctubbsii commented 2 years ago

2022-02-14T20:20:35,625 [TL1] INFO : This is only a test: TABLE_ID[99]

Using slf4j-simple instead of log4j-slf4j-impl, this prints:

[main] INFO TL1 - This is only a test: Test$Foo@545997b1

So, it only works with the log4j-slf4j-impl at runtime, and not other slf4j implementations.

So maybe the abstract id type could implement this interface for logging the id.

That would require having a hard dependency on log4j again in code and leaks this 3rd party library class into the class hierarchy of our public API, both of which have caused us problems historically and we have previously moved to resolve.

Given the API leakage, the tight coupling with log4j, and the fact that it won't work with any other slf4j binding, I don't think this solution is viable.

milleruntime commented 2 years ago

We could just create our own interface like StringBuilderFormattable, call it something like Loggable. Then make whatever objects we want to have a log friendly toString() implement it.

milleruntime commented 2 years ago

With the concrete types we have implemented in TableId and NamespaceId, I don't think it is a big of deal anymore to have the toString() method return a formatted string. I think most places it is used is in string concatenation and logging. The other places where want a canonical() ID it is also pretty clear.

EdColeman commented 2 years ago

Not sure if it would matter - but ids are effectively (if not outright) immutable - the output string could be created on instantiation of the id so that concatenation / formatting would not need to occur on every log statement - the toString would just return a fixed string of the instance.

keith-turner commented 2 years ago

I don't think it is a big of deal anymore to have the toString() method return a formatted string. I think most places it is used is in string concatenation and logging.

If the majority use of toString is for logging then maybe the simplest thing is to make toString format its return. There may be user facing outputs from utilities/shell that we may want to handle differently, if those are a minority then maybe they could just call canonical().

keith-turner commented 2 years ago

So far it seems like there are three viable options (with the log4j specific option not being viable)

Modify toString() on AbstractId to format the id and include a type
- Cons: have to find all places where this is not suitable and change to canonical(). Could lead to errors when constructing metadata table entries and ZK/DFS paths.
- Pros: New code written that logs the id will automatically use the formatted id. For user facing output devs usually test that when writing it and if they don;t like toString() output they can switch to canonical().
Add a new method to AbstractId to format the id and include a type.
- Cons: Have to modify all existing logging to call new method. Have to remember to use the new method when writing new logging stmts.
- Pros: canonical() and toString() are the same, the more I think about this not a huge pro.
Do nothing
- Cons : Logging does consistently include type w/ id.
- Pros: So easy to do.

ctubbsii commented 2 years ago

I think having toString different from canonical is error prone. People will make mistakes, and not realize... whether it's concatenating a path in ZK, constructing a metadata entry, or just an innocuous log statement, mistakes will occur, and they will not be easy to detect, because toString is not explicit in string concatenation, it's merely implied.

However, ultimately, I don't think it's necessary to bother with trying to format the toString for table IDs and namespace IDs. Most of the time when we want to log, we want to log the table / namespace, not just the table ID / namespace ID. I would prefer to have a concrete type for Table and Namespace, whose toString is a formatted representation of both the name and its ID. I'd much rather see something like deleting TABLE[NAME=mytable;ID=2c] than deleting TABLEID(2c).

keith-turner commented 2 years ago

People will make mistakes, and not realize... whether it's concatenating a path in ZK, constructing a metadata entry

That is a really good con. I edited my earlier comment to add it.

keith-turner commented 2 years ago

However, ultimately, I don't think it's necessary to bother with trying to format the toString for table IDs and namespace IDs. Most of the time when we want to log, we want to log the table / namespace, not just the table ID / namespace ID. I would prefer to have a concrete type for Table and Namespace, whose toString is a formatted representation of both the name and its ID. I'd much rather see something like deleting TABLE[NAME=mytable;ID=2c] than deleting TABLEID(2c).

That would be really nice. I would love to see something like TABLE[NAME=mytable;ID=2c] consistently in the logs. Is there an issue for that? If we open an issue for that, then we could probably close this. I don't think there is anything general to do. If we create a new FateTxID type then how it logs can be tailored specifically for it, but there is nothing more general to do in the code base for other id types.

ctubbsii commented 2 years ago

I'd have to mine through tickets to see if there's a current issue for it. I know it was a part of the set of ideas I had for the 2.0 API, to have a concrete Table and Namespace object, as well as part of the recent work I had been looking into about standardizing how we pass table information over RPCs (as name or id). I think Table and Namespace formatting would be covered by that, but this issue might still apply to other kinds of identifiers.

milleruntime commented 2 years ago

While working on https://github.com/apache/accumulo/pull/2780, I realized we need a concrete ID for Fate as well. It would be nice to have if we are going to release an API of fate operations. This is what I was thinking:

/**
 * A strongly typed representation of a Fate Transaction ID. There are two representations of a
 * FateTxId. The first is a hexadecimal long. The second is a formatted String.
 *
 * @since 2.1.0
 */
public class FateTxId extends AbstractId {
    private FateTxId(long hexadecimal) {
        super(FateTxIdUtil.formatTid(hexadecimal));
    }

    /**
     * Get a FateTxId object for the provided hexadecimal number. This is guaranteed to be non-null.
     *
     * @param hexadecimal
     *          Fate transaction ID as a hexadecimal long
     * @return FateTxId object
     */
    public static FateTxId of(final long hexadecimal) {
        try {
            return cache.get(hexadecimal, () -> new FateTxId(hexadecimal));
        } catch (ExecutionException e) {
            throw new AssertionError(
                    "This should never happen: ID constructor should never return null.");
        }
    }

    /**
     * Get a FateTxId object for the provided formatted String. This is guaranteed to be non-null.
     *
     * @param formattedString
     *          Fate transaction ID as a formatted String
     * @return FateTxId object
     */
    public static FateTxId of(String formattedString) {
        if (!FateTxIdUtil.isFormatedTid(formattedString)) {
            throw new IllegalArgumentException("The string provided (" + formattedString +
                    ") is not formatted properly. ");
        }
        return new FateTxId(FateTxIdUtil.fromString(formattedString));
    }
}

apache / accumulo

Explore consistently formatting all ids in Accumulo logs. #2495