camelot-dev / camelot

A Python library to extract tabular data from PDFs
https://camelot-py.readthedocs.io
MIT License
2.81k stars 449 forks source link

Release camelot-fork 0.20.1 #353

Closed foarsitter closed 10 months ago

foarsitter commented 1 year ago

Leaving this here so the it can be merged if there is any interest.

foarsitter commented 1 year ago

This pull-requests does a lot of things. Since things are getting back to life over here I will explain them a bit more.

1) The project is merged into cookiecutter-hypermodern-python which comes with a lot of features and documented workflows for i.e. how to release

2) It removes pdftopng as hard dependency since it is not compatible with python 3.9 and above. Tests that depend on poppler/pdftopng are skipped as long as pdftopng is not updated. The fork is compatible with 3.8 and above. It is up to discussion if we need to add more python versions.

3) It aligns the type of PDFHandler.filepath with PdfReader.stream so Path and IO objects are accepted. filepath can now be an in memory file upload for example.

4) I ran pre-commit to the whole branch to apply black & isort to the codebase.

5) The first pull-requests with bugfixes are merged: https://github.com/foarsitter/camelot/releases/tag/v0.20.1

From my point of view it makes sense to release version 1.0 as a stable release where we support ghostscript and do minor bugfixes and keep our dependencies up to date.

In version 2.0 we can add support for poppler/pdftopng.

What do you tink? Which changes can be cherry-picked and which changes need to dropped?

ExSidius commented 1 year ago

@foarsitter @vinayak-mehta do you have a timeline around when this will be merged in?

I'm running into issues installing camelot on python 3.10 because of pdftopng.

Let me know if I can help at all.

foarsitter commented 1 year ago

@foarsitter @vinayak-mehta do you have a timeline around when this will be merged in?

I'm running into issues installing camelot on python 3.10 because of pdftopng.

Let me know if I can help at all.

Sorry, there is no timeline. If you need a solution asap install the fork (pip install camelot-fork).

MartinThoma commented 1 year ago

@foarsitter I would recommend to break a couple of changes out of this PR. That would have advantages:

  1. It's more likely we get progress sooner
  2. It's easier to review changes
  3. Less merge conflicts

Especially black+isort would be VERY desirable to have in its own PR. That PR should not add / change anything else + it should document which commands were executed.

MartinThoma commented 1 year ago

@foarsitter https://github.com/camelot-dev/camelot/pull/358 applies only black

vinayak-mehta commented 1 year ago

This is a big one, let me go through it this weekend

bosd commented 1 year ago

Fixes #292 , #313, #329 Closes #269

bosd commented 1 year ago

But the way, do you have the powers to publish a new release on pypi?

foarsitter commented 1 year ago

Thanks for your support @bosd, I think we finally can get things moving.

We need one more maintainer te approve before we can merge this. Then we need to find out how we can release to pypi since I do not have the rights to edit the repository settings.

bosd commented 1 year ago

@vinayak-mehta Could you give @foarsitter access on Pypi? So he can publish the releases there. Potentially setup the automatic pypi publisher from the cookiecutter project. https://github.com/cjolowicz/cookiecutter-hypermodern-python

@maintainers Please review..

MartinThoma commented 1 year ago

@vinayak-mehta Could you give @foarsitter access on Pypi?

Alternative, you could add secrets to this repository (which only you would be able to see) + add a github action. For example, when a git tag is added then the action is executed which triggers a build + release.

foarsitter commented 1 year ago

@MartinThoma a commit to master with a change in version in pyproject.toml will trigger the deploy. The configuration is simple.

Sign up at PyPI. Go to the Account Settings on PyPI, generate an API token, and copy it. Go to the repository settings on GitHub, and add a secret named PYPI_TOKEN with the token you just copied.

TestPyPI Sign up at TestPyPI. Go to the Account Settings on TestPyPI, generate an API token, and copy it. Go to the repository settings on GitHub, and add a secret named TEST_PYPI_TOKEN with the token you just copied.

bosd commented 1 year ago

@MartinThoma Are you a maintainer here? Can you give the last approval to get this one in. (This fixes gh actions, which is currently broken and blocking other PR's)

foarsitter commented 1 year ago

Didn't know the cookiecutter template added a copyright with my name. Will remove it tomorrow. Thanks for checking.

foarsitter commented 1 year ago

Found some references at other places that needed to be updated. Sorry for the inconvenience.

foarsitter commented 1 year ago

The changes your are describing are in separated commits, you can select commits on the top left in the Files changed tab image

What kind of smoke-tests do you have in mind? We are using the fork in production and all the tests of this PR are succeeding. Camelot has a pretty decent test coverage (88%) and a lot of different pdf types are tested: https://github.com/camelot-dev/camelot/tree/master/tests/files

One of the reasons I'm using the hypermodern cookiecutter template is that it has a lot of documentation. In example you can read about all the Github actions this PR includes over here and how to make a release over here.

I agree with you that the road I took is quite the spartan way but I think it is the fastest way forward.

MartinThoma commented 1 year ago

The changes your are describing are in separated commits

Interesting. Most of the time I don't look at single commits as they contain a lot of noise. As a maintainer of pypdf, I also don't like this model as it "packages" different unrelated changes. Hence it removes the freedom of choosing which of those I want to take in and which I don't want. The result is typically that it takes overall longer to merge those PRs + causes more work.

In the case of camelot I can understand it though. It's been a while since the last release. Please take my comment here more as a general wish. It's not meant as criticism in this specific case.

We are using the fork in production

Oh, nice! That should do it :-)

bosd commented 1 year ago

So.. concluding.. all lights are green :heart: It's a go :smiley: can we get a merge?

MartinThoma commented 1 year ago

Interesting, I thought you could merge @bosd . @foarsitter Can you merge? Do you know who can?

Just to clarify: I can merge, but not release to PyPI. I'm a bit hesitant as I would rather like to support once in a while than become a core maintainer of another big project :sweat_smile:

foarsitter commented 1 year ago

@MartinThoma yes I'm enable to merge. Since it is a large PR and a lot happened during last weekend I think it is appropriate to have a cooldown of a few days before merging so everyone can react.

Releasing to camelot on pypi is something we need to solve. In the meantime I can release to camelot-fork by forking this repo.

MartinThoma commented 1 year ago

Sounds good!

Regarding the releases, I've opened https://github.com/camelot-dev/camelot/issues/389

bosd commented 1 year ago

Interesting, I thought you could merge @bosd . Do you know who can?

No, I don't have any permissions here to merge. I'm not listed as a maintainer (However I offered to be one).

As to my knowledge only @MartinThoma and @foarsitter are the only currently active maintainers.

foarsitter commented 11 months ago

Squash and merge is the only option for merging.... That need to be changed before merging.

MartinThoma commented 11 months ago

For this specific PR I agree. In general, I prefer squash-and-merge. PRs should only contain one change (once we are back on track)

bosd commented 11 months ago

Squash and merge is the only option for merging.... That need to be changed before merging.

So this means that dev is blocked/stalled again?? Or does any of you have the super powers to change this?

MartinThoma commented 11 months ago

image

I don't have access to the settings.

foarsitter commented 11 months ago

As long there is nobody around with full permission for GitHub and pypi this project could be considered dead.

MartinThoma commented 10 months ago

@vinayak-mehta Can you please give me full access so that I can adjust the project settings (temporarily enabling a merge-commit for this specific PR)?

MartinThoma commented 10 months ago

@foarsitter It seems as if I will not get the necessary permissions any time soon. Would it be ok for you if I squash-merge this PR and then make a release to PyPI?

foarsitter commented 10 months ago

Sure, no objection if its the only way forward. Go ahead :)

MartinThoma commented 10 months ago

https://github.com/camelot-dev/camelot/blob/master/.github/workflows/release.yml seems to have an issue:

image

MartinThoma commented 10 months ago

"Poetry could not find a pyproject.toml file in /home/runner/work/camelot/camelot or its parents"

I don't know why this part is failing.

However, the PyPI part should be removed as it's very likely not configured.