GoogleCloudDataproc / hadoop-connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Apache License 2.0
280 stars 237 forks source link

Status of hsync/hflush and suitability for backing HBase #571

Open jasonewang opened 3 years ago

jasonewang commented 3 years ago

Hi,

I'm experimenting with using hadoop-connector and GCS as the backing filesystem for HBase and ran into some issues and wanted to know if they were known issues or not.

hadoop-connector version: 2.2.1 with patches (JAR built off https://github.com/jasonewang/hadoop-connectors/commit/7890f24f1bf4c31ebf3aaf4e4153e96d64d31829, which includes patches mentioned in 1 & 2.) hbase version: 2.4.2 hadoop version: 3.1.4 gcs related core-site.xml config:

<property>
  <name>fs.defaultFS</name>
  <value>gs://<bucket>/</value>
</property>
<property>
  <name>fs.AbstractFileSystem.gs.impl</name>
  <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
  <description>The AbstractFileSystem for gs: uris.</description>
</property>
<property>
  <name>fs.gs.project.id</name>
  <value></value>
  <description>
    Optional. Google Cloud Project ID with access to GCS buckets.
    Required only for list buckets and create bucket operations.
  </description>
</property>
<property>
  <name>google.cloud.auth.service.account.enable</name>
  <value>true</value>
  <description>
    Whether to use a service account for GCS authorization.
    Setting this property to `false` will disable use of service accounts for
    authentication.
  </description>
</property>
<property>
  <name>google.cloud.auth.service.account.json.keyfile</name>
  <value><path to key></value>
  <description>
    The JSON key file of the service account used for GCS
    access when google.cloud.auth.service.account.enable is true.
  </description>
</property>
<property>
  <name>fs.gs.outputstream.type</name>
  <value>FLUSHABLE_COMPOSITE</value>
</property>

Some observations while testing GCS with HBase:

  1. HBase does not recognize GoogleHadoopSyncableOutputStream as supporting hsync and hflush because it does not implement the StreamCapabilities interface. I've implemented that in a commit here: https://github.com/jasonewang/hadoop-connectors/commit/b59015292127868a8a7bd8981ed6b63c2cebfce8.
  2. Listing files in a directory on GCS returns tempfiles used for flushing (files with _GCS_SYNCABLE_TEMPFILE_ prefix). This was causing HBase to think tempfiles are WALs and errors trying to process them. Worked around this by ignoring the tempfiles when listing directories here: https://github.com/jasonewang/hadoop-connectors/commit/28ea985b1000588b560d0890a5da2acfbe297abe.
  3. The second call to hsync() fails in the composeObjects because the destination file could not be found. ex:
    
    2021-06-10 14:28:22 SEVERE: hsync(): Committing tail file gs://<bucket>/<hbase-dir>/MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336 to final destination gs://<bucket>/<hbase-dir>/MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336

2021-06-10 14:28:22 SEVERE: hsync(): Opening next temporary tail file gs:////MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/_GCS_SYNCABLE_TEMPFILE_hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336.1.00f28391-202a-4aef-8a31-680f6c7a9aa4 as component number 1

2021-06-10 14:28:22 SEVERE: Took 95938688 ns to sync() for gs:////MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336

2021-06-10 14:28:22 SEVERE: hsync(): Committing tail file gs:////MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/_GCS_SYNCABLE_TEMPFILE_hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336.1.00f28391-202a-4aef-8a31-680f6c7a9aa4 to final destination gs:////MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336

2021-06-10 14:28:22 SEVERE: composeObjects([gs:////MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336, gs:////MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/_GCS_SYNCABLE_TEMPFILE_hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336.1.00f28391-202a-4aef-8a31-680f6c7a9aa4], gs:////MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336, CreateObjectOptions{contentEncoding=null, contentType=application/octet-stream, ensureEmptyObjectsMetadataMatch=true, kmsKeyName=null, metadata={}, overwriteExisting=false})

2021-06-10 21:28:22,642 WARN [AsyncFSWAL-0-gs:////MasterData] wal.AsyncFSWAL: sync failed com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found POST https://storage.googleapis.com/storage/v1/b//o/%2FMasterData%2FWALs%2Fhbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824%2Fhbasemaster-0.hbase.hbase-gcs.svc.cluster.local%25252C16000%25252C1623360407824.1623360502336/compose?ifGenerationMatch=1623360502477795 { "code" : 404, "errors" : [ { "domain" : "global",

"message" : "Object <hbase-dir>/MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336 (generation: 0) not found.",
"reason" : "notFound"

} ], "message" : "Object /MasterData/WALs/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336 (generation: 0) not found." } at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:428) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.composeObjects(GoogleCloudStorageImpl.java:2071) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopSyncableOutputStream.commitCurrentFile(GoogleHadoopSyncableOutputStream.java:344) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopSyncableOutputStream.hsyncInternal(GoogleHadoopSyncableOutputStream.java:297) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopSyncableOutputStream.hflush(GoogleHadoopSyncableOutputStream.java:270) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:134) at org.apache.hadoop.hbase.io.asyncfs.WrapperAsyncFSOutput.flush0(WrapperAsyncFSOutput.java:93) at org.apache.hadoop.hbase.io.asyncfs.WrapperAsyncFSOutput.lambda$flush$0(WrapperAsyncFSOutput.java:114) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

Directories and HBase version files are created successfully. I don't see any version files flushed with hsync/hflush.

If the exception thrown by the second call to hflush/hsync is caught, any subsequent calls to commitCurrentFile() will throw an NPE because curDelegate.close() closes the inner channel and sets it to null, but curDelegate is never recreated in hsyncInternal().
NPE ex.

Exception in thread "Close-WAL-Writer-13" java.lang.NullPointerException at com.google.cloud.hadoop.fs.gcs.GoogleHadoopSyncableOutputStream.commitCurrentFile(GoogleHadoopSyncableOutputStream.java:328) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopSyncableOutputStream.close(GoogleHadoopSyncableOutputStream.java:214) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) at org.apache.hadoop.hbase.io.asyncfs.WrapperAsyncFSOutput.recoverAndClose(WrapperAsyncFSOutput.java:121) at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:161) at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.lambda$closeWriter$6(AsyncFSWAL.java:690) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)



With that, is gcs-connector ready for production databases that rely on hsync/hflush compatible HDFS storage? Are there examples of people using FLUSHABLE_COMPOSITE output stream? And is there anything we can do or provide to try to track down why composeObjects is failing?
jasonewang commented 3 years ago

I've narrowed it down to the connector being unable to flush files with a % character in the path. This is due to the tempfile being unable to be looked up, possibly because of a % character in the path. I can reproduce with a short program:

import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.fs.permission.FsPermission;
import org.apache.hadoop.conf.*;
import com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS;

public class Write {
  public static void main (String [] args) throws Exception {
    Configuration conf = new Configuration();
    System.out.println(conf.get("fs.defaultFS"));

    AbstractFileSystem fs = AbstractFileSystem.get(URI.create(conf.get("fs.defaultFS")), conf);
    FSDataOutputStream out = fs.createInternal(
      new Path("/bar%2Cfoo"),
      null, // createflag
      new FsPermission((short)0666), // permissions
      4096, // buffersize
      (short) 3, // replication
      (long) 268435456, // blocksize
      null, // progressable
      null, // checksum
      false); // createParent

    out.writeChars("hello");
    out.hflush();
    out.writeChars("world");
    out.hflush();
    out.close();
  }
}
mprashanthsagar commented 3 years ago

tldr : The metacharater resolution to ASCII code occurs as part of the GCS Object creation as it treats the object name as URi, and composeObjects API lists source objects by name.

The resolution to ASCII code might have occurred twice prior to calling composeObjects

hbasemaster-0.hbase.hbase-gcs.svc.cluster.local,16000,1623360407824/hbasemaster-0.hbase.hbase-gcs.svc.cluster.local%252C16000%252C1623360407824.1623360502336

%252C == %2C ( % maps to %25 in UTF-8)
%2C == ,

The theory is, the URIs are resolved to ASCII code prior to calling composeObjects which is causing the discrepancy. The compose call is unable to find the object to be composed


@Test
  public void testComposeSuccessOnMetacharacters() throws IOException {
    String bucketName = sharedBucketName1;
    URI directory = gcsiHelper.getPath(bucketName, "test-compose/");
    URI object1 = directory.resolve("bar,foo");
    URI object2 = directory.resolve("object,2");
    URI destination = directory.resolve("destination");
    gcsfs.mkdirs(directory);

    // Create the source objects
    try (WritableByteChannel channel1 = gcsfs.create(object1)) {
      assertThat(channel1).isNotNull();
      channel1.write(ByteBuffer.wrap("content1".getBytes(UTF_8)));
    }
    try (WritableByteChannel channel2 = gcsfs.create(object2)) {
      assertThat(channel2).isNotNull();
      channel2.write(ByteBuffer.wrap("content2".getBytes(UTF_8)));
    }
    assertThat(gcsfs.exists(object1) && gcsfs.exists(object2)).isTrue();

    gcsfs.compose(
        ImmutableList.of(object1, object2), destination, CreateObjectOptions.CONTENT_TYPE_DEFAULT);

    byte[] expectedOutput = "content1content2".getBytes(UTF_8);
    ByteBuffer actualOutput = ByteBuffer.allocate(expectedOutput.length);
    try (SeekableByteChannel destinationChannel =
        gcsiHelper.open(bucketName, "test-compose/destination")) {
      destinationChannel.read(actualOutput);
    }
    assertThat(actualOutput.array()).isEqualTo(expectedOutput);
  }

  @Test
  public void testComposeFailsOnMetacharacters() throws IOException {
    String bucketName = sharedBucketName1;
    URI directory = gcsiHelper.getPath(bucketName, "test-compose/");
    URI object1 = directory.resolve("object%2C1");
    URI object2 = directory.resolve("object%2C2");
    URI destination = directory.resolve("destination");
    gcsfs.mkdirs(directory);

    // Create the source objects
    try (WritableByteChannel channel1 = gcsfs.create(object1)) {
      assertThat(channel1).isNotNull();
      channel1.write(ByteBuffer.wrap("content1".getBytes(UTF_8)));
    }
    try (WritableByteChannel channel2 = gcsfs.create(object2)) {
      assertThat(channel2).isNotNull();
      channel2.write(ByteBuffer.wrap("content2".getBytes(UTF_8)));
    }
    assertThat(gcsfs.exists(object1) && gcsfs.exists(object2)).isTrue();

    gcsfs.compose(
        ImmutableList.of(object1, object2), destination, CreateObjectOptions.CONTENT_TYPE_DEFAULT);

    byte[] expectedOutput = "content1content2".getBytes(UTF_8);
    ByteBuffer actualOutput = ByteBuffer.allocate(expectedOutput.length);
    try (SeekableByteChannel destinationChannel =
        gcsiHelper.open(bucketName, "test-compose/destination")) {
      destinationChannel.read(actualOutput);
    }
    assertThat(actualOutput.array()).isEqualTo(expectedOutput);
  }

Stacktrace for the failed object

  com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
POST https://storage.googleapis.com/storage/v1/b/gcsio-test_prsagar_1f5d3f57_shared-1/o/test-compose%2Fdestination/compose?ifGenerationMatch=0
{
  "code" : 404,
  "errors" : [ {
    "domain" : "global",
    "message" : "Object test-compose/object%2C1 (generation: 0) not found.",
    "reason" : "notFound"
  } ],
  "message" : "Object test-compose/object%2C1 (generation: 0) not found."
}

  at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
  at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
  at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
  at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:428)
  at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
  at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
  at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
  at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
  at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.composeObjects(GoogleCloudStorageImpl.java:2094)
  at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.compose(GoogleCloudStorageImpl.java:2052)
  at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.compose(GoogleCloudStorageFileSystem.java:719)
  at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystemIntegrationTest.testComposeFailsOnMetacharacters(GoogleCloudStorageFileSystemIntegrationTest.java:1663)
  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
  at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
  at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
  at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
  at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
  at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
  at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
  at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
  at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
  at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)

From the test run, when debugged after creating the object Created objects are of the form : object,1 when input name was object%2C1

Screen Shot 2021-07-01 at 4 59 04 PM

When trying to compose, the object names passes are of the form object%2C1, which causes GCS to throw ObjectNotFound

Screen Shot 2021-07-01 at 5 03 39 PM

Mitigation

Resolution Options

  1. Check and Resolve ASCII code back to metacharater for source and destination objects in compose
  2. Do not accept metacharaters for composeObjects, and accept only the objectNames.

I would prefer Option 2, and expect the objectNames to not contain ASCII codes, which would not have any changes from the API in the connector, but add a validation to convey the error more expressively

mprashanthsagar commented 3 years ago

Please re-open if the issue persists

medb commented 3 years ago

Seems like we still can fix this issue as suggested above?