datastrato / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
616 stars 193 forks source link

[Bug report] IP issues with 3rd party code #949

Closed justinmclean closed 6 months ago

justinmclean commented 6 months ago

Describe what's wrong

3rd party code has been copied into the repository without updating LICENSE and NOTICE.

Error message and/or stacktrace

N/A

How to reproduce

Discovered with ScanOSS SBOM workbench.

Additional context

The code in question is:

@FANNG1 HBase: /gravitino/server-common/src/main/java/com/datastrato/gravitino/server/web/PrivilegedThreadFactory.java

@xunliu Presto: /gravitino/integration-test/src/main/java/com/datastrato/gravitino/integration/test/util/CloseableGroup.java

@xunliu Zeppelin: /gravitino/bin/common.sh

justinmclean commented 6 months ago

Also these files have ASF headers but are not mentioned in LICENSE. Where do they come from?

jerryshao commented 6 months ago

@FANNG1 assign this issue to you, can you please check all the codes Justin mentioned above and submit a PR to rectify them.

Please be aware that part of them are already fixed in #954

xunliu commented 6 months ago

./web/WEB-INF/web.xml @ch3yne

hi @justinmclean I wrote this web.xml. It was not copied from another project.

FANNG1 commented 6 months ago

Also these files have ASF headers but are not mentioned in LICENSE. Where do they come from?

  • . /api/src/main/java/com/datastrato/gravitino/rel/expressions/Literal.java @mchades
  • ./catalogs/catalog-hive/bin/test/hive-schema-3.1.0.derby.sql (path is wrong in LICENSE)
  • ./clients/client-java/src/main/java/com/datastrato/gravitino/client/HTTPClient.java (path is wrong in LICENSE)
  • ./clients/client-java/src/main/java/com/datastrato/gravitino/client/RESTClient.java (path is wrong in LICENSE)
  • ./clients/client-java/src/test/java/com/datastrato/gravitino/client/TestHTTPClient.java (path is wrong in LICENSE)
  • ./core/src/main/java/com/datastrato/gravitino/utils/ClientPool.java (path is wrong in LICENSE)
  • ./core/src/main/java/com/datastrato/gravitino/utils/ClientPoolImpl.java (path is wrong in LICENSE)
  • ./integration-test/src/test/java/com/datastrato/gravitino/integration/test/util/CommandExecutor.java (path is wrong in LICENSE)
  • ./integration-test/src/test/java/com/datastrato/gravitino/integration/test/util/ProcessData.java (path is wrong in LICENSE)
  • ./web/WEB-INF/web.xml @ch3yne

@justinmclean , could you share how you find these problems? I could check it after modified

justinmclean commented 6 months ago

./web/WEB-INF/web.xml @ch3yne

hi @justinmclean I wrote this web.xml. It was not copied from another project.

Then why does the header state it is licensed to the ASF?

justinmclean commented 6 months ago

@FANNG1 I've fixed the file paths in LICENSE, but that's all. These files and LICENSE and NOTICE files need fixing: ./server-common/src/main/java/com/datastrato/gravitino/server/web/PrivilegedThreadFactory.java ./integration-test/src/main/java/com/datastrato/gravitino/integration/test/util/CloseableGroup.java ./bin/common.sh ./api/src/main/java/com/datastrato/gravitino/rel/expressions/Literal.java ./web/WEB-INF/web.xml

What needs to be done to LICENSE and NOTICE will depend on where they have come from and how they are licensed.

FANNG1 commented 6 months ago

@xunliu Presto: /gravitino/integration-test/src/main/java/com/datastrato/gravitino/integration/test/util/CloseableGroup.java @xunliu Zeppelin: /gravitino/bin/common.sh

@xunliu , can you confirm where the files come from?

. /api/src/main/java/com/datastrato/gravitino/rel/expressions/Literal.java

@mchades where does the file come from?

justinmclean commented 6 months ago

I used the ScanOSS SBOM workbench tool. However, you need to understand how it works to filter down its output and look at each issue, some can be ignored as they are false positives.

FANNG1 commented 6 months ago

./web/WEB-INF/web.xml

I will change it to Datastrato

justinmclean commented 6 months ago

I also looked for any files that had ASF headers but were not mentioned in the LICENSE file. For that, I used Apache Rat (all files marked with AL are Apache licensed) and some shell commands like grep and find.

justinmclean commented 6 months ago

./web/WEB-INF/web.xml

I will change it to Datastrato

If it had the ASF header, it very likely came originally from an ASF project. We should never replace a 3rd party header in a file. You need the permission of the ASF to remove that header.

justinmclean commented 6 months ago

For instance, PrivilegedThreadFactory.java and CloseableGroup.java are 3rd party code but their headers have been replaced with the the Datatstrato header. We should never do this. We can't claim ownership of code that has been copied from a 3rd party.

FANNG1 commented 6 months ago

For instance, PrivilegedThreadFactory.java and CloseableGroup.java are 3rd party code but their headers have been replaced with the the Datatstrato header. We should never do this. We can't claim ownership of code that has been copied from a 3rd party.

Got it, thx, I will fix it

FANNG1 commented 6 months ago

PrivilegedThreadFactory is from jetty which is Eclipse Public License - v 2.0, @justinmclean , please confirm that whether we can use it.

justinmclean commented 6 months ago

We can not use EPL as it is Category B and can't be included in an ALv2 source release. However, not all Jetty code is under the EPL. You might want to double-check to be sure. It may be that Jetty also copied it from somewhere else?

justinmclean commented 6 months ago

Do we know exactly where it was copied from?

FANNG1 commented 6 months ago

We can not use EPL as it is Category B and can't be included in an ALv2 source release. However, not all Jetty code is under the EPL. You might want to double-check to be sure. It may be that Jetty also copied it from somewhere else?

The header of PrivilegedThreadFactory shows it support apache 2.0 too?

//
// ========================================================================
// Copyright (c) 1995 Mort Bay Consulting Pty Ltd and others.
//
// This program and the accompanying materials are made available under the
// terms of the Eclipse Public License v. 2.0 which is available at
// https://www.eclipse.org/legal/epl-2.0, or the Apache License, Version 2.0
// which is available at https://www.apache.org/licenses/LICENSE-2.0.
//
// SPDX-License-Identifier: EPL-2.0 OR Apache-2.0
// ========================================================================
//
justinmclean commented 6 months ago

Having it dual-licensed like that is fine we just need to state it is Apache-licensed in our license file.

justinmclean commented 6 months ago

A good example of why you should never remove the original header from a file.

justinmclean commented 6 months ago

We still need to know were it came from the location I found with the header you gave did not match the code.

I found this: https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/util/concurrent/Executors.java

which you'll see includes PrivilegedThreadFactory but is GPL licensed.

FANNG1 commented 6 months ago

We still need to know were it came from the location I found with the header you gave did not match the code.

I found this: https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/util/concurrent/Executors.java

which you'll see includes PrivilegedThreadFactory but is GPL licensed.

Jetty may changed the code, the code is not borrowed from openjdk

justinmclean commented 6 months ago

We need to know the exact version; even the Jetty version has multiple headers and licenses.

justinmclean commented 6 months ago

This versions is not the version copied: https://github.com/jetty/jetty.project/blob/9f68bcc517f6e15eeed772a971442e1a70c88b30/jetty-util/src/main/java/org/eclipse/jetty/util/thread/PrivilegedThreadFactory.java#L4

justinmclean commented 6 months ago

One of the other versions says "the terms of the Eclipse Public License v1.0 and Apache License v2.0". Not and not or which we could not use.