apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.88k stars 3.38k forks source link

[Release][Java] Verify staged maven artifacts #30961

Open asfimport opened 2 years ago

asfimport commented 2 years ago

We have two tests right now:

  1. Execute mvn test from the source tarball's java directory testing the source https://github.com/apache/arrow/blob/master/dev/release/verify-release-candidate.sh#L278
  2. Verify the checksums and signatures of the uploaded maven artifacts https://github.com/apache/arrow/blob/master/dev/release/verify-release-candidate.sh#L766

But we don't actually test the packages. We should add that to the verification scripts, since 7.0 is going to be the first release shipping the jars with bundled JNI libraries.

cc @kou @anthonylouisbsb

Reporter: Krisztian Szucs / @kszucs

Related issues:

Note: This issue was originally created as ARROW-15486. Please see the migration documentation for further details.

asfimport commented 2 years ago

Kouhei Sutou / @kou: I think that we should do this too. We already published *-tests.jar such as https://repository.apache.org/content/repositories/staging/org/apache/arrow/arrow-dataset/7.0.0/arrow-dataset-7.0.0-tests.jar but I don't know how to run them...

asfimport commented 2 years ago

David Dali Susanibar Arce / @davisusanibar: Hi Team, 

I check https://repository.apache.org/content/repositories/staging/org/apache/arrow/ for .jar arrow library and any of these appear on that site

When we are planning to test the packages in theory there are tested by maven modules dependency between theirs. Only top layer module are not testing, for these layer will be needed to create a demo project that only download jar dependencies.

For this purpose to I could test the java jar packages could you help me with information needed for maven settings.xml to I could execute:

 


mvn -Parrowrc clean install 

 

settings.xml

 


<?xml version="1.0" encoding="UTF-8"?>
<settings xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd" xmlns="http://maven.apache.org/SETTINGS/1.1.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <servers>
    <server>
      <username>xxx</username>
      <password>yyy</password>
      <id>snapshots</id>
    </server>
  </servers>
  <profiles>
    <profile>
      <repositories>
        <repository>
          <snapshots />
          <id>snapshots</id>
          <name>libs-snapshot</name>
          <url>uuu</url>
        </repository>
      </repositories>
      <pluginRepositories>
        <pluginRepository>
          <snapshots />
          <id>snapshots</id>
          <name>plugins-release</name>
          <url>uuu</url>
        </pluginRepository>
      </pluginRepositories>
      <id>arrowrc</id>
    </profile>
  </profiles>
  <activeProfiles>
    <activeProfile>arrowrc</activeProfile>
  </activeProfiles>
</settings> 

 

According to these PR https://github.com/apache/arrow/pull/11617/files .jar will be uploaded to: [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/#\{@release_version}-rc0

 

@kou  

 

 

asfimport commented 2 years ago

Kouhei Sutou / @kou: We stopped uploading .jar s to Artifactory because we use https://repository.apache.org/ 's staging feature. I hope that https://repository.apache.org/content/repositories/staging/org/apache/arrow/ is used for verification but I don't know how to specify the URL... I'm not familiar with Java...

@sunchao Do you know how to specify the URL? https://github.com/apache/arrow/pull/11669#issuecomment-965887786 ?

asfimport commented 2 years ago

Chao Sun / @sunchao: @kou after you the RC on nexus is closed, the artifacts will be available in the staging repository, and users can update their Maven pom.xml using the following settings:


    <repository>
       <id>staged</id>
       <name>staged-releases</name>
       <url>https://repository.apache.org/content/repositories/staging/</url>
       <releases>
         <enabled>true</enabled>
       </releases>
       <snapshots>
         <enabled>true</enabled>
       </snapshots>
     </repository>

(and of course, they need to update the Arrow version to the one specified in the RC).

asfimport commented 2 years ago

Kouhei Sutou / @kou: Thanks!

@davisusanibar Could you try this?

asfimport commented 2 years ago

David Dali Susanibar Arce / @davisusanibar: Testing jni dataset library: Downloading from snapshots: https://repository.apache.org/content/repositories/staging/org/apache/arrow/arrow-memory/7.0.0/org/apache/arrow/arrow-dataset/7.0.0/arrow-dataset-7.0.0.jar

 

MacOS Big Sur - 11.5.2 - JDK 8: OK

otool -L libarrow_dataset_jni.dylib libarrow_dataset_jni.dylib:     @rpath/libarrow_dataset_jni.700.dylib (compatibility version 700.0.0, current version 700.0.0)     /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)     /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1200.3.0)     /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)

 

Ubuntu 20.04.3 LTS - JDK 11: OK

Evidence: https://github.com/apache/arrow-cookbook/runs/5149652103?check_suite_focus=true

ldd libarrow_dataset_jni.so         linux-vdso.so.1 (0x00007fff259ac000)         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3968d13000)         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3968d08000)         libstdc+.so.6 => /lib/x86_64-linux-gnu/libstdc+.so.6 (0x00007f3968b26000)         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f39689d7000)         libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f39689bc000)         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3968997000)         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f39687a5000)         /lib64/ld-linux-x86-64.so.2 (0x00007f396ac7a000)

Testing Code:

 


.. testcode::

    import org.apache.arrow.dataset.file.FileFormat;
    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
    import org.apache.arrow.dataset.jni.NativeMemoryPool;
    import org.apache.arrow.dataset.source.DatasetFactory;
    import org.apache.arrow.memory.RootAllocator;
    import org.apache.arrow.vector.types.pojo.Schema;
    import org.apache.arrow.util.AutoCloseables;

    String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/data1.parquet";
    RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE);
    DatasetFactory datasetFactory = new FileSystemDatasetFactory(rootAllocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri);
    Schema schema = datasetFactory.inspect();
    AutoCloseables.close(datasetFactory);

    System.out.println(schema);

.. testoutput::

    Schema<id: Int(32, true), name: Utf8>(metadata: {parquet.avro.schema={"type":"record","name":"User","namespace":"org.apache.arrow.dataset","fields":[{"name":"id","type":["int","null"]},{"name":"name","type":["string","null"]}]}, writer.model.name=avro}) 

 

 

Pending: Testing JNI C Data Interface (I need to learn more about that library to test that)

 

Question: Is there some reason to not be able to download arrow flight for staging https://repository.apache.org/content/repositories/staging/org/apache/arrow/arrow-flight/ ? 

asfimport commented 2 years ago

Kouhei Sutou / @kou: We have https://repository.apache.org/content/repositories/staging/org/apache/arrow/flight-core/ and https://repository.apache.org/content/repositories/staging/org/apache/arrow/flight-grpc/ instead.

asfimport commented 2 years ago

David Dali Susanibar Arce / @davisusanibar: Hi @kou 

Related to java arrow flight I see that arrow-flight parent is not correctly updating the *.pom to the maven repository that since 7.0.0 this is needed as a parent pom for flight module

We are not seeing arrow-flight-7.0.0.pom at:

asfimport commented 2 years ago

David Dali Susanibar Arce / @davisusanibar: Hi @kou ,

I am seeing on github workflow logs por Nightly and Release and it show us clear that arrow-flight-7.0.0.pom is generated on the server an loaded to the github resources server

 


2022-01-29T05:35:32.0491480Z [INFO] Installing /Users/runner/work/crossbow/crossbow/arrow/java/flight/pom.xml to /Users/runner/.m2/repository/org/apache/arrow/arrow-flight/7.0.0/arrow-flight-7.0.0.pom
2022-01-26T08:25:33.0926250Z [INFO] Installing /Users/runner/work/crossbow/crossbow/arrow/java/flight/pom.xml to /Users/runner/.m2/repository/org/apache/arrow/arrow-flight/7.0.0.dev585/arrow-flight-7.0.0.dev585.pom
2022-01-29T05:56:57.0811310Z INFO:crossbow:Uploading asset `arrow-flight-7.0.0.pom` with mimetype application/zip and size 2518...
2022-01-26T08:36:08.2444750Z INFO:crossbow:Uploading asset `arrow-flight-7.0.0.dev585.pom` with mimetype application/zip and size 2525...

 

asfimport commented 2 years ago

Kouhei Sutou / @kou: There are no validations for now. So we need to implement a verification mechanism by this issue.

See also: https://lists.apache.org/thread/fbrgvf30os5h4ox7fk4txrlgdp1g5g4g

@BryanCutler is also taking a look at this.