dask / fastparquet

python implementation of the parquet columnar file format.
Apache License 2.0
788 stars 178 forks source link

Its a bit of work, but would be really helpful if we published release notes with each release #544

Open haleemur opened 3 years ago

haleemur commented 3 years ago

I think this is important when convincing maintainers of existing codebases to update to the latest version of fastparquet.

It also highlights newly added capabilities of fastparquet to promote usage where otherwise a different library or serialization format would be used.

example scenario.

I know a feature I am interested in is in the most recent release (support for writing new nullable types), but i had to search & share merge requests to convince dev-ops to upgrade fastparquet in production (that it adds the feature we want & does not introduce any breaking change).

In the past, I designed a data transformation workflow that output a final dataframe to csv instead of parquet in order to load into redshift. If I were to do the same today, I would prefer a parquet based pipeline, however other data engineers are likely not even aware that this is a possibility.

martindurant commented 3 years ago

fastparquet is really in maintenance-only mode these days, so there is very little that would go into release noted beyond a list of PR summaries. The features/capabilities of fastparquet have been stable for a long time.

impredicative commented 3 years ago

@martindurant Considering that the package version has updated from 0.5 to 0.6.x, is it still fair to say that the package is in maintenance-only mode? Irrespective, it is very strange to not have a changelog.

martindurant commented 3 years ago

The above information is no longer correct, and as of #586 , a lot of new functionality has been implemented.

Rest assured, that when the current round of enhancements is followed by a release, there will be complete documentation of the state of the repo, possibly with a version 1.0.