apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.12k stars 1.16k forks source link

June 2024 ASF Board Report #10155

Closed alamb closed 4 months ago

alamb commented 5 months ago

Is your feature request related to a problem or challenge?

Per https://whimsy.apache.org/roster/committee/datafusion the DataFusion ASF board report schedule is

March, June, September, December

Describe the solution you'd like

I would like to draft a board report for the ASF board meeting, ideally with community help.

The meetings are typically in the second or third week of the month

Describe alternatives you've considered

I plan to do this in the same style that worked well in Arrow (see an example from @andygrove here https://lists.apache.org/thread/7w4mgy98qomc6drvj2fo81gvhq6p0boc) -- make a google doc (or issue) that people can add relevant content to and then the chair (me for the time being) submits it to the board

Additional context

No response

alamb commented 5 months ago

See #10281 for example

alamb commented 4 months ago

Draft report: https://docs.google.com/document/d/1h4yjvomQO0XdzxKuE4aBSWGNliFFmn8GADd8DlPuXBw/edit

alamb commented 4 months ago

Also posted to mailing list https://lists.apache.org/thread/199ymolos20sr9vvz5ctv6j2nnrgrbo2

alamb commented 4 months ago

Submitted following report:

## Description:
The mission of Apache DataFusion is the creation and maintenance of software 
related to an extensible query engine

## Project Status:
Current project status: New + Ongoing (high activity)
Issues for the board: None

## Membership Data:
Apache DataFusion was founded 2024-04-16 (2 months ago)
There are currently 32 committers and 13 PMC members in this project.
The Committer-to-PMC ratio is roughly 2:1.

Community changes, past quarter:
- Ruihang Xia was added to the PMC on 2024-06-13
- Mehmet Ozan Kabak was added to the PMC on 2024-06-13
- Mustafa Akur was added to the PMC on 2024-05-09
- Oleks V. was added to the PMC on 2024-05-09

## Project Activity:

The project continues to be quite active with many PRs and issues opened and
closed per day.

We have mostly completed tasks related to becoming a new top level project
including an ASF press release[0] the new top level project and document ing
more thoroughly the process of inviting new committers and PMC members[1].

We also began discussing adopting the sql parser into the DataFusion ASF
governance process[2].

There are also several regional meetups planned: in San Francisco in June and
in China in July.

[0]: https://news.apache.org/foundation/entry/
  apache-software-foundation-announces-new-top-level-project-apache-datafusion
[1]: https://github.com/apache/datafusion/pull/10778
[2]: https://github.com/sqlparser-rs/sqlparser-rs/issues/1294

### DataFusion core
https://github.com/apache/datafusion

We made our first successful release as a new project, version 38.0.0

In addition to the work related to moving to a top-level project, the
community continues to work on making logical planning faster, making function
packages (i.e. UDFs) modular and easier to mix/match, and “de-parsing” logical
plan expressions back to SQL, and improve type coercion.

Recently there has been renewed interest in reading parquet files and creating
secondary indexes.

### Sub project: DataFusion Python
https://github.com/apache/datafusion-python

The DataFusion Python subproject has become more active since the last board
report with contributions from several contributors. Version 37 was released,
and version 38 is in the process of being released

### Sub project: DataFusion Comet
https://github.com/apache/datafusion-comet

The Comet subproject has had face to face sync meetings which are recorded[1].

[1] https://lists.apache.org/thread/9kqxkpwxf4oxonfboyfh8j6ko7r3fb3z

The Comet subproject is very active and is receiving significant contributions
from new contributors. There is some initial documentation published at
https://datafusion.apache.org/comet/.

### Sub project: DataFusion Ballista
https://github.com/apache/datafusion-ballista
https://github.com/apache/datafusion-ballista-python

The Ballista subproject is not currently actively maintained.

### Recent Releases
* PYTHON-38.0.1 was released on 2024-05-30.
* PYTHON-37.1.0 was released on 2024-05-13.
* 38.0.0 was released on 2024-05-10.

## Community Health:

We have added several new committers and PMC members (see above) in the last
month, and we expect to continue to do so regularly. While it would always be
nice to have more bandwidth to devote to PMC activities, we are currently
doing well.

While most communications still happen through github, the mailing lists are
now fully active, as reflected in their metrics:

* dev@datafusion.apache.org had a big increase in traffic in the past quarter
  (71 emails compared to 0)
* github@datafusion.apache.org had a big increase in traffic in the past
  quarter (7685 emails compared to 0)