Closed geruh closed 8 months ago
I don't think this was intended behavior.
It seems that when instantiating the RESTCatalog, a new InMemoryFileIO instance is also created separately.
Could you point out where this happens exactly?
Based on my current understanding, the InMemoryCatalog
serves as the supporting catalog for the RESTCatalog. When a table is created the RESTTableOperations
, returns a ResolvingFileIO
object that encapsulates a HadoopFileIO
instance. In scenarios like the one demonstrated by the testAppendFiles method (within the REST catalog context), after committing changes, it appears that only InMemoryFileIO
is used for referencing metadata files.
However, attempting to access these metadata files with the REST table's HadoopFileIO
results in a NotFoundException
. This issue does not occur when accessing the files through the backendCatalog, indicating that HadoopFileIO
can access files that InMemoryFileIO
cannot.
ResolvingFileIO
InMemoryFileIO
Additionally, when attempting to read a manifest list or snapshot created on the table using InMemoryFileIO
, a NotFoundException
is also encountered.
It's also worth noting that the test methods utilizing this functionality in the TestRESTCatalog class initiate a new instance of InMemoryFileIO
for each test, following the conf passed in to the catalog.
Synced with @nastra through Slack.
This is unexpected behavior, the RESTCatalog
used to be backed by the JdbcCatalog
where the ResolvingFileIO knew how to handle this Catalog. However, according to this pull-request the JdbcCatalog
didn't have view support, therefore it was swapped out with the InMemoryCatalog
.
We have at least two potential courses for moving forward, either we wait for the view support efforts in the JdbcCatalog
in this pull-request and revert the changes. Or we fix this behavior in the RESTCatalogTests
which is suggested by @nastra.
cc: @jackye1995 @amogh-jahagirdar
+1 to fixing the behavior in TestRESTCatalog
when using InMemoryCatalog
It makes sense to improve the TestRESTCatalog
with InMemoryCatalog
backend.
Query engine
None
Question
We are seeing some unexpected behavior when testing the
RESTCatalog
. TheRESTCatalog
is intended to use anInMemoryCatalog
for testing purposes. It appears that the RESTCatalog andInMemoryCatalog
are using separate instances ofInMemoryFileIO
, even though they point to the same warehouse location which makes sense. However, our expectation was that the RESTCatalog would share the sameInMemoryCatalog
instance (andInMemoryFileIO
instance) across all tests. This would ensure the files associated with tables are accessible and consistent between the backed catalog and RESTCatalog.It seems that when instantiating the
RESTCatalog
, a newInMemoryFileIO
instance is also created separately. As a result, test data is not actually shared between the two catalogs. For instance, when committing data to a table, theInMemoryCatalog
's FileIO stores the table metadata since the CatalogHandler delegates metadata creation. But when creating the table, theRESTCatalog
's separateInMemoryFileIO
instance is returned as part ofRESTTableOperations
.This leaves the
InMemoryCatalog
storing table metadata, while theRESTCatalog
's IO stores Snapshot and ManifestList data separately.Is this the intended behavior? Or did we expect the
RESTCatalog
to share the sameInMemoryCatalog
andInMemoryFileIO
instances during testing?@nastra @jackye1995