dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
25.68k stars 8.67k forks source link

[CI] add script to generate meta info and upload to s3 #10295

Closed wbo4958 closed 1 week ago

wbo4958 commented 2 weeks ago

I would like to add a meta file to describe the latest xgboost nightly build info including xgboost version and commit id, and then upload it to s3, the file looks like that,

meta.xml

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <version>2.1.0.dev0</version>
  <commit>ee2afb3256ce2f333c0da56408dfc7360bbf8b7a</commit>
</root>

Hi @hcho3 Could you help review it.

hcho3 commented 1 week ago

Some questions:

  1. Should the meta info be encoded as XML? Why not a JSON?
  2. Can we use a Python script to generate the XML file, to be consistent with other utility scripts in the repo? I can write the equivalent Python script.
  3. Should meta.xml encode the latest nightly version of manylinux2014_x86_64 wheel specifically? Sometimes the CI might break and a nightly build may be available for amd64 but not manylinux2014_x86_64.
wbo4958 commented 1 week ago

Hi @hcho3, It would be great if you can help on that.

hcho3 commented 1 week ago

I've written a script to generate meta.json like this:

{
    "platform_tag": "manylinux2014_x86_64",
    "version": "2.1.0.dev0",
    "commit_id": "6d65b593c65d130c1d67c6c5a1a1ba13c5b148f6"
}
hcho3 commented 1 week ago

@trivialfis Can I get a review? I want your opinion on the new Python script (format_wheel_meta.py).

wbo4958 commented 1 week ago

First, we can wget meta.json and parse it and get the platform/version/commit_id and then compose them with "https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/master/" to get the download link.

hcho3 commented 1 week ago

Would it be useful to also include the full name of the wheel file in meta.json too?

wbo4958 commented 1 week ago

Would it be useful to also include the full name of the wheel file in meta.json too?

Hmm, yeah, good idea.

hcho3 commented 1 week ago

The meta.json output will now look like

{
    "wheel_name": "xgboost-2.1.0.dev0-py3-none-linux_x86_64.whl",
    "platform_tag": "manylinux2014_x86_64",
    "version": "2.1.0.dev0",
    "commit_id": "6d65b593c65d130c1d67c6c5a1a1ba13c5b148f6"
}