Open MartinThoma opened 1 year ago
Or is somebody there who would like to become a maintainer?
Hi @MartinThoma, sorry for not being responsive here. I've been busy with some life stuff for some time now and haven't had the mindspace to look into the issues here. I've been wanting to get back into it, I'll look into them over this weekend.
I also want to stop being a single point of failure here and would love to get help maintaining camelot going forward.
Is there any movement on this topic?
@vinayak-mehta Would you mind if I post this on the Indian FOSS and opendata channels? This tool has been extremely helpful in dealing with the PDF crap Indian Government puts out.
Sorry for my inpatientence but the show must go on. In order to take camelot to production I created a fork and released it to pypi under camelot-fork==0.20.0
. My intentions are limited and I hope this project finds new maintainers soon.
When the request came to extract tables from pdf files I thought it would be very tricky job but camelot does it all. Therefore I want to express my gratitude to all of you that made that possible.
If people are in need of a fix I'm willing to accept pull request as long as they have test-coverage.
Sorry for being a bit unresponsive since I created this issue. I've pushed a release based on @MartinThoma's last PR: https://pypi.org/project/camelot-py/
@MartinThoma Thank you for the PR, would you like to be added to the github org so that you have push access to the repo?
@foarsitter Are you interested in maintaining the project here instead of the fork? I can add you to the github org too.
Thank you for making a new release :pray:
would you like to be added to the github org so that you have push access to the repo?
I would probably not be super active as I spend most of my time with pypdf. If that is ok for you, then yes, please add me :-) I could probably go over a couple of the PRs / ~introduce~ update CI so that maintaining the library becomes easier :-)
@MartinThoma That would be awesome! Just sent you an invite ✉️
@vinayak-mehta should be awesome
Are there any rules I should follow, e.g.
As I see it:
1) a review is always good, unless it is something realy trivial. Be patiënt. Reverting releases because we are to eager is something we should want to avoid. 2) a commit message should be clear about its contents, which style applied is less important to me. As I see it we can generate a changelog based on the titles of the merged pull requests (see my fork: https://github.com/foarsitter/camelot/releases) 3) If there are more commit message needed because there are changes in various parts of the codebase then a squash seems not a good fit to me, so it depends on the PR. 4) Adding tests afterwards is really hard, even harder when you are not the author of the code. So full coverage is recommended here in my point of view. If the addition is trivial the test should be trivial too right?
@foarsitter I just invited you, sorry it took so long
I'm a big fan of the scikit-learn contributing guidelines.
I'd be interested in contributing, particularly to the docs initially.
It seems to me there's a lot of value in this repo, but things seem to have got into a fairly confusing state. Devoting time to it, I think I've figured out most of the misunderstandings I had and it seems like it's worth sharing / updating docs, so that others don't fall into the exact same traps I (and others) did.
Correct me if I'm wrong but I get the sense that what has made things harder overall is that the migration to pdftopng/poppler backend was in progress yet not completed when the maintenance fell away (quite reasonably given world events!)
@foarsitter 's idea of a fork that cuts pdftopng out is interesting, although I would feel more comfortable if it was directly part of the main repo.
How feasible is it to make the "base" install be equivalent to the fork (ie such that it doesn't install pdftopng as a requirement)? And with that, introduce a "pdftopng" extra requires option so people can optionally try it and then - only once it's deemed to work sufficiently well - it is switched to be what gets delivered with "base" at some later point. Presumably for that to happen there needs to be a bit of maintenance upstream in pdftopng too. If this last paragraph is best discussed in a separate issue, that's fine by me, just say 🙂
@nmstoker I cannot answer that question, but at least I could review/merge PRs with documentation updates :-) so if there are specific learnings you want to share, I would support you :-)
Sounds a good start, thanks @MartinThoma !
@nmstoker Looking forward to your learnings!
After using the products for a long time in my developer career, I just started my contribution to Camlot with my in initial pull request for (#364). I would love to contribute to other projects as well. Thanks
How about Excalibur?? That might need some :heart: as well. There is still an open refresh issue on windows which makes it unusuable.
P.s. I'm happy to contribute/maintain a bit on both projects.
Looking forward to your contributions @bosd
@vinayak-mehta Have you seen my e-mail?
Owner
permissions (instead of just Maintainer
) via https://pypi.org/manage/project/camelot-py/collaboration/ so that I can take care of https://github.com/camelot-dev/camelot/issues/389 ?camelot-dev
merging into py-pdf
?@MartinThoma How about Owner / Admin persmissions for Excalibur?
Just wandering in but happy to contribute. 👋🏻
@vinayak-mehta Have you seen my e-mail?
- PyPI permissions: Can you please give me
Owner
permissions (instead of justMaintainer
) via https://pypi.org/manage/project/camelot-py/collaboration/ so that I can take care of Release to PyPI via Github Action #389 ?- Github permissions: Can you please give me Admin permissions via https://github.com/camelot-dev/camelot-py/settings/access so that I can allow merge-commits for Release camelot-fork 0.20.1 #353 ?
- Project Governance: Would you be OK with the Github organization
camelot-dev
merging intopy-pdf
?
Are these permission issues solved already, @MartinThoma?
Can you please take care of these blockers, @vinayak-mehta?
No. I still don't have sufficient permissions to bring the project back to life. Camelot is dead.
In case this helps others (since we didn't know until we tried camelot and ran into various issues how much it's maintenance is suffering), here are a few active PDF processing alternatives in the Python ecosystem:
Not sure if people saw it, but in #479 I show some ideas I had with the docs.
With care I think it should be feasible to guide most people around the current difficulties with installation (I've managed setup in Windows and various Linux environments, no access to Mac but guess it's not that different to the Linux steps for the most part)
We need to fork camelot if we want to continue developing it.
I've already talked with the people of py-pdf (website) and they are fine moving it there. But we need two people who would take care of it so that it's not another dead version.
@bosd @foarsitter Would it still be fine to you to become the new maintainers?
Discussion is here: https://github.com/py-pdf/pypdf/discussions/2466
@MartinThoma I'm willing to help where I can!
@MartinThoma : Please pull me in. I would like to contribute to the code.
I can fix the PdfFileReader deprecation error, please pull me in.
I can fix the PdfFileReader deprecation error, please pull me in.
@ammadakram Can you please open a PR here: https://github.com/py-pdf/pypdf_table_extraction
@MartinThoma @vinayak-mehta @bosd I am facing the same error as Kushal, Expected Output: List of tables Standard Output since this week: "Attribute Error: File Format not supported". Could you please let me know if a fix has been deployed on the forked branch, this was working a week ago and for my particular use case lattice boundary provided exclusively in camelot-py[cv] is required.
Please try the code from the new repo. If the problem exists, please open a issue there.
It seems like camelot is dead:
Besides the owner there are only 35 other contributors.
https://opencollective.com/camelot might be another way to check if it's dead.
Does anybody know more? Should we try to transfer the project to https://github.com/jazzband ?