Open haojiliu opened 1 year ago
The project is currently supported and maintained. We recently published a new release to Maven. The release targets Spark 3.3. AWS Glue Data Quality, which uses Deequ, was recently released at AWS re:Invent in November 2022.
Contributions to the project are welcome. We will be happy to review your pull requests.
thanks, I might have more stringent requirements in terms of SLA and types of commits allowed, as i'm working on a critical project. @rdsharma26 can i discuss with you on more details? Do you have an email i can reach out to?
i want to avoid the situation where i have to create my own forks for my own development while i can just commit into deequ itself.
@haojiliu In order to keep the Deequ related communication in one place and for posterity, our preferred method of communication is using Github Issues. We would love to hear more about your requirements and the changes you plan to make.
Forking the main repository and then creating pull requests against the main repository's master branch is the recommended way of contributing to the project. This is the same practice that our team is following.
thanks @rdsharma26 , there are two particular issues that we would like to get fixed asap once confirmed that can be repro'd in deequ:
Q to you - will you or another active maintainer of this project be able to provide reasonably quick review/ship, say within 3-5days, if we propose the fix and make prs for those?
@haojiliu We found the same problem.
@haojiliu Thank you for the information. We are actively maintaining this project and we will be reviewing any open PRs and providing feedback accordingly.
@rdsharma26 @haojiliu @meimiao0730
We have experienced the same issue about ApproxCountDistinct
as we got the following error before
ApproxCountDistinctConstraint(ApproxCountDistinct(hour,None)) : Failure Some(Value: 25.0 does not meet the constraint requirement! The approximate count distinct of hour column should be == 24.)
so we had to switch to use hasNumberOfDistinctValues
instead
https://github.com/awslabs/deequ/blob/d8bfb9c71bdce712d64d861343a801a5c5a9562c/src/main/scala/com/amazon/deequ/checks/Check.scala#L351
we are a group of engineers interested in contributing to this project, is there a person that we can get in touch with to understand the status quo of this project?