Unidata / netcdf-java

The Unidata netcdf-java library
https://docs.unidata.ucar.edu/netcdf-java/current/userguide/index.html
BSD 3-Clause "New" or "Revised" License
146 stars 71 forks source link

Creating a datasetScan for data in a S3/MinIO bucket #1191

Closed pedro-cf closed 1 year ago

pedro-cf commented 1 year ago

https://github.com/Unidata/netcdf-java/pull/173#issuecomment-1581071071 @tdrwenski

Hi @pedro-cf , here is an example from our tests:

<datasetScan name="Test S3 dataset scan"
    ID="testS3DatasetScan"
    path="s3-dataset-scan"
    location="cdms3:thredds-test-data#delimiter=/">
    <serviceName>all</serviceName>
</datasetScan>

The location should be the cdms3 url of the form cdms3:myBucket#delimiter=/ or cdms3:myBucket?myKey/Prefix/#delimiter=/ (more info here). Note the fragment #delimiter=/ which is necessary to treat the object store buckets/key as hierarchical, like a file system.

If you have further questions feel free to open a new issue to discuss it!

Hello I've been struggling trying to connect thredds to a MinIO bucket (not sure if it's possible) in order to create a NetCDF catalog.

I have a min.io instance running locally on localhost:9000 with user "minio" and password "minio123" and a bucket "test" simply holds NetCDF (.nc) files.

I'm not sure if it would be possible to structure a catalog for this situation.

tdrwenski commented 1 year ago

I have not personally used MinIO, but it looks like from the docs it is S3 compatible. All S3 compatible ObjectStores should work with TDS.

If you have tried something similar to my example above and it still doesn't work, please feel free to share the catalog you are trying and the error you are getting.

pedro-cf commented 1 year ago

I have not personally used MinIO, but it looks like from the docs it is S3 compatible. All S3 compatible ObjectStores should work with TDS.

Min.io is an open-source object storage server that is compatible with the Amazon S3 API.

If you have tried something similar to my example above and it still doesn't work, please feel free to share the catalog you are trying and the error you are getting.

I'm not sure how to build the catalog.

haileyajohnson commented 1 year ago

Here are instructions on hw to form the dataset url for an object store: https://docs.unidata.ucar.edu/netcdf-java/current/userguide/dataset_urls.html#object-stores So you'll want something like: <datasetRoot path="test-path" location="cdms3://minio@localhost:9000/test" />

You can see here for examples and information on setting up the credentials: https://github.com/lesserwhirls/tds-s3-jpl-test/tree/main

As well as the discussion here: https://github.com/Unidata/tds/issues/194

pedro-cf commented 1 year ago

Here are instructions on hw to form the dataset url for an object store: https://docs.unidata.ucar.edu/netcdf-java/current/userguide/dataset_urls.html#object-stores So you'll want something like: <datasetRoot path="test-path" location="cdms3://minio@localhost:9000/test" />

You can see here for examples and information on setting up the credentials: https://github.com/lesserwhirls/tds-s3-jpl-test/tree/main

As well as the discussion here: Unidata/tds#194

I've tried creating a crendentials file and placing it at /usr/local/tomcat/.aws/crendentials based on:

[minio]
aws_access_key_id=minio
aws_secret_access_key=minio123

and created the following catalog.xml:

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="THREDDS Server Default Catalog : You must change this to fit your server!"
         xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0
           https://schemas.unidata.ucar.edu/thredds/InvCatalog.1.0.6.xsd">

  <service name="all" base="" serviceType="compound">
    <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/"/>
    <service name="dap4" serviceType="DAP4" base="/thredds/dap4/"/>
    <service name="http" serviceType="HTTPServer" base="/thredds/fileServer/"/>
    <service name="wcs" serviceType="WCS" base="/thredds/wcs/"/>
    <service name="wms" serviceType="WMS" base="/thredds/wms/"/>
    <service name="ncssGrid" serviceType="NetcdfSubset" base="/thredds/ncss/grid/"/>
    <service name="ncssPoint" serviceType="NetcdfSubset" base="/thredds/ncss/point/"/>
    <service name="cdmremote" serviceType="CdmRemote" base="/thredds/cdmremote/"/>
    <service name="iso" serviceType="ISO" base="/thredds/iso/"/>
    <service name="ncml" serviceType="NCML" base="/thredds/ncml/"/>
    <service name="uddc" serviceType="UDDC" base="/thredds/uddc/"/>
  </service>

  <datasetRoot path="test-path" location="cdms3://minio@localhost:9000/test"/>
  <datasetScan name="Test MinIO dataset scan"
    ID="testMinIODatasetScan"
    path="minio-dataset-scan"
    location="cdms3://minio@localhost:9000/test#delimiter=/">
    <serviceName>all</serviceName>
  </datasetScan>

</catalog>

And when I load the thredds webpage I get:

IOException: invalid catalog: catalog.xml
haileyajohnson commented 1 year ago

Do you see any exceptions in your logs that you could share?

pedro-cf commented 1 year ago

Do you see any exceptions in your logs that you could share?

I was having some network connectivity issues bettween the min.io container and the thredds container but have now fixed them although now I am getting this error:

2023-06-09T11:16:31.659 +0000 [    299312][      10] ERROR - thredds.client.catalog.builder.CatalogBuilder - failed to read xml catalog at file:/usr/local/tomcat/content/thredds/catalog.xml, err=software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Unsupported or unrecognized SSL message

Full logs from content/thredds/logs/threddsServlet.log:

2023-06-09T11:16:29.722 +0000 [    297375][       5] INFO  - threddsServlet - Request Completed - 500 - -1 - 1958
2023-06-09T11:16:30.413 +0000 [    298066][       9] INFO  - threddsServlet - Remote host: 172.24.0.1 - Request: "GET /thredds/catalog/catalog.html HTTP/1.1"
2023-06-09T11:16:30.771 +0000 [    298424][       9] ERROR - thredds.client.catalog.builder.CatalogBuilder - failed to read xml catalog at file:/usr/local/tomcat/content/thredds/catalog.xml, err=software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Unsupported or unrecognized SSL message
2023-06-09T11:16:30.772 +0000 [    298425][       9] WARN  - thredds.server.TdsErrorHandling - TDS Error
java.io.IOException: invalid catalog: catalog.xml
    at thredds.server.catalog.ConfigCatalogCache.readCatalog(ConfigCatalogCache.java:128) ~[tdcommon-5.5-SNAPSHOT.jar:5.5-SNAPSHOT]
    at thredds.server.catalog.ConfigCatalogCache.lambda$get$0(ConfigCatalogCache.java:94) ~[tdcommon-5.5-SNAPSHOT.jar:5.5-SNAPSHOT]
    at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4868) ~[guava-31.1-jre.jar:?]
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3533) ~[guava-31.1-jre.jar:?]
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2282) ~[guava-31.1-jre.jar:?]
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159) ~[guava-31.1-jre.jar:?]
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049) ~[guava-31.1-jre.jar:?]
    at com.google.common.cache.LocalCache.get(LocalCache.java:3966) ~[guava-31.1-jre.jar:?]
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863) ~[guava-31.1-jre.jar:?]
    at thredds.server.catalog.ConfigCatalogCache.get(ConfigCatalogCache.java:94) ~[tdcommon-5.5-SNAPSHOT.jar:5.5-SNAPSHOT]
    at thredds.core.CatalogManager.getCatalog(CatalogManager.java:89) ~[classes/:5.5-SNAPSHOT]
    at thredds.server.catalogservice.CatalogServiceController.handleRequest(CatalogServiceController.java:55) ~[classes/:5.5-SNAPSHOT]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205) ~[spring-web-5.3.27.jar:5.3.27]
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:150) ~[spring-web-5.3.27.jar:5.3.27]
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:117) ~[spring-webmvc-5.3.27.jar:5.3.27]
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895) ~[spring-webmvc-5.3.27.jar:5.3.27]
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808) ~[spring-webmvc-5.3.27.jar:5.3.27]
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) ~[spring-webmvc-5.3.27.jar:5.3.27]
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1072) [spring-webmvc-5.3.27.jar:5.3.27]
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:965) [spring-webmvc-5.3.27.jar:5.3.27]
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) [spring-webmvc-5.3.27.jar:5.3.27]
    at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898) [spring-webmvc-5.3.27.jar:5.3.27]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:489) [servlet-api.jar:?]
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) [spring-webmvc-5.3.27.jar:5.3.27]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:583) [servlet-api.jar:?]
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:212) [catalina.jar:8.5.89]
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:156) [catalina.jar:8.5.89]
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) [tomcat-websocket.jar:8.5.89]
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:181) [catalina.jar:8.5.89]
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:156) [catalina.jar:8.5.89]
    at org.apache.catalina.filters.CorsFilter.handleNonCORS(CorsFilter.java:330) [catalina.jar:8.5.89]
    at org.apache.catalina.filters.CorsFilter.doFilter(CorsFilter.java:155) [catalina.jar:8.5.89]
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:181) [catalina.jar:8.5.89]
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:156) [catalina.jar:8.5.89]
    at org.apache.catalina.filters.HttpHeaderSecurityFilter.doFilter(HttpHeaderSecurityFilter.java:126) [catalina.jar:8.5.89]
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:181) [catalina.jar:8.5.89]
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:156) [catalina.jar:8.5.89]
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:337) [spring-security-web-5.7.8.jar:5.7.8]
    at thredds.servlet.filter.RequestBracketingLogMessageFilter.doFilter(RequestBracketingLogMessageFilter.java:50) [classes/:5.5-SNAPSHOT]
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:346) [spring-security-web-5.7.8.jar:5.7.8]
    at thredds.servlet.filter.RequestQueryFilter.doFilter(RequestQueryFilter.java:90) [classes/:5.5-SNAPSHOT]
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:346) [spring-security-web-5.7.8.jar:5.7.8]
    at thredds.servlet.filter.HttpHeadFilter.doFilter(HttpHeadFilter.java:47) [classes/:5.5-SNAPSHOT]
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:346) [spring-security-web-5.7.8.jar:5.7.8]
    at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:221) [spring-security-web-5.7.8.jar:5.7.8]
    at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:186) [spring-security-web-5.7.8.jar:5.7.8]
    at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:354) [spring-web-5.3.27.jar:5.3.27]
    at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:267) [spring-web-5.3.27.jar:5.3.27]
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:181) [catalina.jar:8.5.89]
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:156) [catalina.jar:8.5.89]
    at org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Log4jServletFilter.java:71) [log4j-web-2.17.1.jar:2.17.1]
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:181) [catalina.jar:8.5.89]
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:156) [catalina.jar:8.5.89]
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167) [catalina.jar:8.5.89]
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) [catalina.jar:8.5.89]
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:483) [catalina.jar:8.5.89]
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:130) [catalina.jar:8.5.89]
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) [catalina.jar:8.5.89]
    at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:682) [catalina.jar:8.5.89]
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) [catalina.jar:8.5.89]
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) [catalina.jar:8.5.89]
    at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:617) [tomcat-coyote.jar:8.5.89]
    at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) [tomcat-coyote.jar:8.5.89]
    at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:932) [tomcat-coyote.jar:8.5.89]
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1695) [tomcat-coyote.jar:8.5.89]
    at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) [tomcat-coyote.jar:8.5.89]
    at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) [tomcat-util.jar:8.5.89]
    at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) [tomcat-util.jar:8.5.89]
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-util.jar:8.5.89]
    at java.lang.Thread.run(Thread.java:829) [?:?]
pedro-cf commented 1 year ago

Some additional questions:

How can non-AWS credentials be specified? I have searched the docs and cannot find an answer.

How is a non-AWS url structured?

pedro-cf commented 1 year ago

I'm truly lost if anyone could help would highly appreciate @tdrwenski @haileyajohnson

tdrwenski commented 1 year ago

This invalid catalog error can be caused by not being able to connect to your bucket (the datasetScan will check if the bucket is indeed a valid "directory" to be scanned). The stack trace here isn't very useful unfortunately. You can enable debug logging for all ObjectStore requests as outlined here. Hopefully that will clarify why your request is failing.

WeatherGod commented 1 year ago

The reason for the failure is listed in the traceback, albeit hidden. SdkClientException: Unable to execute HTTP request: Unsupported or unrecognized SSL message. There are some possibilities on what this means, though. You should definitely try turning on the aws sdk debug messages to get more details.

To help debug this issue, I'd try to verify that the bucket works with another AWS S3 tool, boto3: https://stackoverflow.com/questions/32618216/override-s3-endpoint-using-boto3-configuration-file

pedro-cf commented 1 year ago

Ok so the strangest thing happened....

I added the logger to /usr/local/tomcat/webapps/thredds/WEB-INF/classes/log4j2.xml

<logger name="software.amazon.awssdk.request" level="debug" additivity="false">
      <appender-ref ref="threddsServlet"/>
    </logger>

and suddenly my catalog started working connected to min.io....

I even commented the logger afterwards and rebooted thredds and still works...

pedro-cf commented 1 year ago

The reason for the failure is listed in the traceback, albeit hidden. SdkClientException: Unable to execute HTTP request: Unsupported or unrecognized SSL message. There are some possibilities on what this means, though. You should definitely try turning on the aws sdk debug messages to get more details.

To help debug this issue, I'd try to verify that the bucket works with another AWS S3 tool, boto3: https://stackoverflow.com/questions/32618216/override-s3-endpoint-using-boto3-configuration-file

The bucket works just fine. I'm used to working with it and min.io has a console web page to manage them.

Regarding the Unsupported or unrecognized SSL message my min.io local instance is not running with SSL certificate, so the only solution I found was to make Min.io run on port 8080 based on these docs:

image

haileyajohnson commented 1 year ago

Glad you got it working! Please reach out if you encounter more issues