SuffolkLITLab / FormFyxer

A tool for learning about and pre-processing forms
MIT License
11 stars 1 forks source link

25 detect sensitive fields #134

Closed codestronger closed 1 month ago

codestronger commented 1 month ago

Updates for detecting sensitive fields: https://github.com/SuffolkLITLab/RateMyPDF/issues/25

codestronger commented 1 month ago

Oh, didn't expect it would publish the package on a PR branch, especially w/ failing tests. I used 0.3.0a2 as the version, so doubt anyone will grab this. Not sure what the status of 0.3.0 is... Seems like we're in some sort of alpha still?

Code in docx_wrangling & pdf_wrangling is causing the mypy failures. Didn't make any changes to those files. Seems like the changes are from the pdf_context_extract merge... Any ideas if those are crucial and need fixing? I'm not familiar w/ mypy but I can dig into it if it's a priority.

BryceStevenWilley commented 1 month ago

That GH action is a bit misleading, nothing is published until you make a GH tag. The action still runs to make sure it can package correctly, but it doesn't publish the new version (you can see that here). You can check https://pypi.org/project/formfyxer/#history to see that there's not a new version there. IMO 0.3.0a2 is fine, it still need more testing to make sure it can be used as a dependency without issue (i.e. the problems we had when redeploying RateMyPDF were all issues with FormFyxer).

It'd be nice if you could look into the mypy issues, but if you can't, I can fix them this weekend. Those issues pop up because our version of mypy increments, and it gets better at finding mismatching types that could cause issues.

BryceStevenWilley commented 1 month ago

Realizing the mypy issues might be because it looks like you merged https://github.com/SuffolkLITLab/FormFyxer/tree/pdf_context_extract into your branch, which wasn't in main before. Idk how well reviewed that code is (even though I wrote it, lol). Might be worth getting it, but I'd want to make a separate PR and review for that, which should also fix up some of the mypy issues. I can also do that this weekend.

BryceStevenWilley commented 1 month ago

So mypy on main and #137 is fixed. Looking a bit closer, I don't think this PR needs #137 though? @codestronger, can you confirm that and if so, drop or revert https://github.com/SuffolkLITLab/FormFyxer/pull/134/commits/2c01ae08bf010dea6b45e857a7bfb39110f578af and https://github.com/SuffolkLITLab/FormFyxer/pull/134/commits/9bfea3389e16bf25a9c8b312202a11dacf84965e from this PR, i.e. https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History?

codestronger commented 1 month ago

So mypy on main and #137 is fixed. Looking a bit closer, I don't think this PR needs #137 though? @codestronger, can you confirm that and if so, drop or revert 2c01ae0 and 9bfea33 from this PR, i.e. https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History?

Sounds good. I can rebase this PR on the current main. @nonprofittechy merged the pdf_context_extract branch back in Feb and it's been live as far as I'm aware. Others might have pulled it down too. Was it safe to rewrite it out of the history? At minimum, we should merge the new version of the pdf_context_extract before deploying my changes, so that we don't lose any functionality from a deploy.

nonprofittechy commented 1 month ago

I don't remember why I merged this branch in February, unfortunately.

codestronger commented 1 month ago

I don't remember why I merged this branch in February, unfortunately.

@BryceStevenWilley I'll proceed with the plan that assumes pdf_context_extract was an accidental merge. We discussed the various downstream users of FormFyxer and as far as we can tell, no one is depending on the features in that branch.

I also plan to officially take the 0.3.0 version number for this branch. Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed. Now that we've separated out pdf_context_extract, maybe we don't need all of those yet.

codestronger commented 1 month ago

Diff looks much better to me! +1 for changing to 0.3.01, and +1 for merging.

Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed

I think those are still necessary, those dependencies started being used in an earlier commit I think.

Footnotes

  1. We were using alphas because this package has been notoriously difficult to use as a dependency, and without proper testing it can cause issues. I think these changes will be tested enough at this point.

Ah, okay! I'll leave the dependencies alone then. In any case, we'll need them in the future.