Closed acsor closed 5 years ago
Hey @claird, have a look at Refactor project structure before merging. As I said there, I can incorporate the proposed changes into this very PR if you like it.
The latest two revisions have brought about a major file system refactory and PEP 8-related adjustments, respectively.
Shouldn't we perhaps drop the 4
from the pypdf4
package name? In the distant future PyPDF may undergo several changes, but if we manage to maintain some coherence about the way it is utilized by users, by having import statements as
from pypdf import PdfFileReader, ...
we could simplify upgrades. Take as an example PyPDF5: that would demand an from pypdf5 import ...
statement. Without this trailing version number, upgrades from the users could be done seamlessly and without need for code changes.
Hey @claird, I just updated the scripts in samplecode/
(previously Sample_Code/
) that you seemed to care about in the Wiki Milestones page:
write "PyPDF4 'Hello, World'" page--the samples and such are in worse shape than I realized
Shouldn't we perhaps drop the 4 from the
pypdf4
package name?
Since we seemed a bit idle, I did the change regardless of claird's approval, although I see it very well motivated. In case it is not wanted it will be easy to revert back to the unconvenient pypdf4
name (for no reason choose PyPDF4
though!).
There are plenty of good changes here. I hope this PR can be merged soon.
Btw, is there any plans to address some issues created in the PyPDF2 repo?
Btw, is there any plans to address some issues created in the PyPDF2 repo?
Which ones, particularly? I scrolled through it a while ago and tried to scan for relevant issues and unmerged pull requests.
There are plenty of good changes here
Thanks for it :+1:. Actually, I still have a little more that I hope too push in a few days. Many are foundational to the library and would benefit from a peer review. Lacking them, I have developed many more unit tests.
Btw, is there any plans to address some issues created in the PyPDF2 repo?
Which ones, particularly? I scrolled through it a while ago and tried to scan for relevant issues and unmerged pull requests.
I can bring here some examples. One I've been facing is https://github.com/mstamy2/PyPDF2/issues/449.
Hey @DeliciousHair, with reference to #11 I have implemented a new method in PdfFileReader
, namely objects()
, that lists all the indexed objects definitions in a PDF file. Since it looks like you needed it, would you mind trying it out in some of your code? I have already done extensive unit testing with it and am comfortable to say that it should work reliably.
I'm looking forward to seeing this PR merged. Any chance it can be reviewed soon @claird?
The very last commit targets Issue #21. Hopefully this has been solved with minimal effort.
I'm looking forward to seeing this PR merged. Any chance it can be reviewed soon @claird?
Since I'm going to be more busy due to my university duties in a couple of weeks or so, my stream of contributions will not continue indefinitely. I'd like to leave some enduring legacy, so I have worked on establishing those that I deem "proper contribution guidelines". Here is an initial draft and when (and if) I'll be added to the contributors list I'll propose the change to the official repository. Wiki pages don't behave like normal repositories and there isn't such a thing as a pull request.
Would some of you mind to share their own thoughts and suggestions over these few paragraphs? The document is purposefully succinct, otherwise no one would bother to check it out.
Shall we spell the word 'TO-DO' or 'TODO'?
I'll probably add a paragraph to the effect that, while we're rigorous with these requirements, we're also liberal with our help. If someone has a line that he believes must be 85 characters, or doesn't know how to retreat to 2.7, someone else with the project almost certainly will know a work-around.
Cameron Laird, vice president We make computers work for people.
On Thu, Sep 27, 2018 at 1:17 PM Oscar notifications@github.com wrote:
I'm looking forward to seeing this PR merged. Any chance it can be reviewed soon @claird https://github.com/claird?
Since I'm going to be more busy due to my university duties in a couple of weeks or so, my stream of contributions will not continue indefinitely. I'd like to leave some enduring legacy, so I have worked on establishing those that I deem "proper contribution guidelines". Here is an initial draft https://github.com/newnone/PyPDF4/wiki/Contributing-Rules and when (and if) I'll be added to the contributors list I'll propose the change to the official repository. Wiki pages don't behave like normal repositories and there isn't such a thing as a pull request.
Would some of you mind to share their own thoughts and suggestions over these few paragraphs? The document is purposefully succinct, otherwise no one would bother to check it out.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/claird/PyPDF4/pull/14#issuecomment-425192861, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbN9OfwvDyDgo__xcQf39mMbrtqFj6_ks5ufRYagaJpZM4WjXVE .
Shall we spell the word 'TO-DO' or 'TODO'?
I advise TO-DO
, because that seems to be the default in several IDEs.
Oh, my. You're right, of course, and I hadn't noticed. My difficulty is that pylint recognizes only TODO. Oh, well ...
We certainly have more important matters before us.
Cameron Laird, vice president We make computers work for people.
On Sun, Sep 30, 2018 at 4:56 PM Cameron Laird claird@phaseit.net wrote:
Shall we spell the word 'TO-DO' or 'TODO'?
I'll probably add a paragraph to the effect that, while we're rigorous with these requirements, we're also liberal with our help. If someone has a line that he believes must be 85 characters, or doesn't know how to retreat to 2.7, someone else with the project almost certainly will know a work-around.
Cameron Laird, vice president We make computers work for people.
On Thu, Sep 27, 2018 at 1:17 PM Oscar notifications@github.com wrote:
I'm looking forward to seeing this PR merged. Any chance it can be reviewed soon @claird https://github.com/claird?
Since I'm going to be more busy due to my university duties in a couple of weeks or so, my stream of contributions will not continue indefinitely. I'd like to leave some enduring legacy, so I have worked on establishing those that I deem "proper contribution guidelines". Here is an initial draft https://github.com/newnone/PyPDF4/wiki/Contributing-Rules and when (and if) I'll be added to the contributors list I'll propose the change to the official repository. Wiki pages don't behave like normal repositories and there isn't such a thing as a pull request.
Would some of you mind to share their own thoughts and suggestions over these few paragraphs? The document is purposefully succinct, otherwise no one would bother to check it out.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/claird/PyPDF4/pull/14#issuecomment-425192861, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbN9OfwvDyDgo__xcQf39mMbrtqFj6_ks5ufRYagaJpZM4WjXVE .
@newnone could you summarize the most important changes of the merged PR? I just replaced PyPDF2 with PyPDF4 on my project and it seems to work fine in my first tests. I would like to know what I can expect in terms of performance, bug fixes and new features.
This series of revisions shall not mark a whole new version of the library, which can be incremented nevertheless, but put it in a cleaner and more stable state for future and more serious enhancements. More improvements and features are yet to come.
As I explained earlier before initiating it, this PR wasn't meant to add new functionalities. At the outset I just wanted to bring it to a cleaner state for future improvements, but then I managed to create something new. To the external user, the new goodies probably are:
PdfReader.objects()
, that allows to fetch all the indirect objects from a PDF file.scripts/codecs.py
that gathers the "filters" (LZW, ASCIIHex, ASCII85, ...) through a command-line interface.PdfFileReader
and PdfFileWriter
(__repr__()
, __del__()
, __enter__()
, __exit__()
)import pypdf ...
. from pypdf import *
is also more careful with what stuff to include in the namespace.PdfFileReader.filepath
, PdfFileReader.isObjectFree()
, PdfFileReader.isClosed()
, etc.Internally, I have brought about changes like:
ObjectStream
class definition in generics.py
, plus other changes to the same modulesamplecode/
Probably there has been more than just this, but I lack the time to list them all even succintly.
Btw, is there any plans to address some issues created in the PyPDF2 repo?
Which ones, particularly? I scrolled through it a while ago and tried to scan for relevant issues and unmerged pull requests.
I can bring here some examples. One I've been facing is mstamy2#449.
@newnone, have you already taken some time to look into this issue?
This PR includes two primary streams of work. In essence it includes a bigger test suite and the "Table of Contents" feature to the
pdfcat
utility (whosePYTHONPATH
handling has been adjusted accordingly).It adds several stylistics changes: an almost complete move to the lower camelCase style for functions and methods, which proved to be the prevalent one before the modification, thus almost putting to end issue #12; correct spacing and line lengths that no longer exceed 79 characters, as recommended by PEP 8; added documentation and comments to either the public interface of the library and its private components (addressed for example by #11).
Some algorithms in
filters.py
have been adjusted more faithfully to their specification, and the interface of the*Decode
classes unified: those that were previously called this way have been refactored into*Codec
and added anencode()
anddecode()
static method (either implemented or raising aNotImplementedError
) if previously missing. There are now even more test cases for these classes too.A test case has been introduced for
decodeStreamData()
offilters.py
as well, and probably the whole file is now covered (something which could be verified by a coverage tool such as coverage).Probably one of the persisting problems here is the one from #7. I have noticed that the implementation of
LZWCodec
coherently encodes/decodes byte streams of up to 10K bytes approximately; after that, a misalignment provokes data corruption. Some work should be performed to solve the little bug (probably just an off-by-one error or a seemingly small inaccuracy).This series of revisions shall not mark a whole new version of the library, which can be incremented nevertheless, but put it in a cleaner and more stable state for future and more serious enhancements. More improvements and features are yet to come.
Edit: as time went on I had the change to add, well... much more to the initial PR contents. The new commits that I've pushed may not reflect the initial description, but are clearly documented by revisions' comments.