Open dependabot[bot] opened 1 day ago
@natikgadzhi - fyi, this one is tricky. I fixed some obvious things but next steps are not clear. Currently raising some kind of failure during schema discovery.
fwiw nltk 3.9
is a no-go, had a bug fixed in 3.9.1
.
A newer version of nltk exists, but since this PR has been edited by someone other than Dependabot I haven't updated it. You'll get a PR for the updated version as normal once this PR is merged.
@aaronsteers @aldogonzalez8 alright, this is a security concern, so we should wrap this up and fix the tests that are failing. I hope to wrap this up by end of week, but if this bleeds into the next week, I will make a line for this and assign it to Aldo.
You got this?
@natikgadzhi and @aldogonzalez8 - Summarizing here what I'm finding...
1 - With the version bump alone, we start getting zero records from unstructured sources:
2 - The unstructured
library leverages nltk
(with no version constraints) so I tried to bump that to the latest version as well, with the hope that this would fix the issue - on the theory that maybe there's an incompatibility of versions and some part of the parsing is failing silently.
3 - Bumping the unstructured
library causes some breaking changes - which I can mostly resolve. Except I can't test if this actually fixes it because there's another library called python-magic
(aka magic
or libmagic
) used in inferences that relies on a c library that I don't have on my machine and don't want to create a hard dependency in our build process on pre-installing. Unstructured attempts to check for the presence of this library so it can fall back to other methods if needed, except that in the latest version the check doesn't work.
4 - I opened an issue below to see if Unstructured can fix the magiclib
dependency check. One thing I didn't try was to simply import and override the constant with False
to see if that would allow Unstructured to fall back to the desired behavior.
With all the above, I don't actually know if bumping Unstructured will solve the issue. I reverted the Unstructured version bump and was about to see if I could pin down the reason why we are getting zero records. @aldogonzalez8, I'm going to task switch over to SDM for a bit, but I can come back - or I can pair with you to get you up to speed. Hopefully the above context is helpful.
Bumps nltk from 3.8.1 to 3.9.
Changelog
Sourced from nltk's changelog.
... (truncated)
Commits
24936a2
Bump version to 3.9c222897
Merge branch 'develop' of https://github.com/nltk/nltk into develop34c3a4a
Merge branch 'develop' of https://github.com/nltk/nltk into develop253dd3a
add blackc43727f
Update version7137405
Merge pull request #3066 from asishm/bugfix-lambda-closure-leak369cb9f
Merge pull request #3245 from ekaf/hotfix-closuredup501c70e
Merge branch 'develop' into hotfix-closuredupbf05dc4
Merge pull request #3306 from ekaf/py3_compat66539c7
Sorted output in unit/test_wordnet.pyDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show