Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.53k stars 595 forks source link

Unable to import unstructured.partition.xyz #2888

Open flaviobrienza opened 2 months ago

flaviobrienza commented 2 months ago

I am trying to use the Unstructured library locally using the Python 3.10.2 version. Everytime I try to import unstructured.partition.something, for example "from unstructured.partition.pdf import partition_pdf" the kernel dies. I followed all the steps to install the library, but it keeps occurring. Unstructured version == 0.13.2 Unstructured-inference version == 0.7.25

scanny commented 2 months ago

Hi @flaviobrienza, I'm unable to reproduce any behavior that meets this description on my machine. We'll need some more specifics:

flaviobrienza commented 2 months ago

Hello, I'm currently running Windows 11. The problem is that I don't receive any message. When I try to make the import, for example from unstructured.partition.html import partition_html It assumes two different behaviors, either the cell starts running and never stops or I receive the message "the kernel died" from my Jupyter Notebook. I followed the Unstructured procedure to install all the dependencies.

scanny commented 2 months ago

Ah, interesting. So the kernel that's doing the dying is the Jupyter Notebook kernel.

Try running from unstructured.partition.html import partition_html from the Python command prompt and see what it does. That should give a more descriptive error.

flaviobrienza commented 2 months ago

This happens:

End of stack trace (more stack frames may be present) 61485 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50 64026 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50 70491 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50 71163 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50 75280 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50 78059 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50 79348 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50 80287 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50 80845 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50 82362 [main] python (17296) C:\Users\FLAVIO\AppData\Local\Programs\Python\Python310\python.exe: fatal error - Internal error: TP_NUM_C_BUFS too small: 50

Il Lun 15 Apr 2024, 20:31 Steve Canny @.***> ha scritto:

Ah, interesting. So the kernel that's doing the dying is the Jupyter Notebook kernel.

Try running from unstructured.partition.html import partition_html from the Python command prompt and see what it does. That should give a more descriptive error.

— Reply to this email directly, view it on GitHub https://github.com/Unstructured-IO/unstructured/issues/2888#issuecomment-2057556834, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXP3KPONLC3SMZLINOMLBNDY5QMHZAVCNFSM6AAAAABGGGWPDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXGU2TMOBTGQ . You are receiving this because you were mentioned.Message ID: @.***>

scanny commented 2 months ago

This solution might be worth a try: https://stackoverflow.com/a/76255079/1902513

flaviobrienza commented 2 months ago

Thanks, I'll try

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Privo di virus.www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Il giorno lun 15 apr 2024 alle ore 21:22 Steve Canny < @.***> ha scritto:

This solution might be worth a try: https://stackoverflow.com/a/76255079/1902513

— Reply to this email directly, view it on GitHub https://github.com/Unstructured-IO/unstructured/issues/2888#issuecomment-2057639520, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXP3KPKY37CH5DBJ3V6PA3TY5QSGBAVCNFSM6AAAAABGGGWPDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXGYZTSNJSGA . You are receiving this because you were mentioned.Message ID: @.***>

flaviobrienza commented 2 months ago

I checked, I already have all the libraries required.

Il giorno mar 16 apr 2024 alle ore 12:13 Flavio Brienza < @.***> ha scritto:

Thanks, I'll try

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Privo di virus.www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#m_-6062374805007310321_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Il giorno lun 15 apr 2024 alle ore 21:22 Steve Canny < @.***> ha scritto:

This solution might be worth a try: https://stackoverflow.com/a/76255079/1902513

— Reply to this email directly, view it on GitHub https://github.com/Unstructured-IO/unstructured/issues/2888#issuecomment-2057639520, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXP3KPKY37CH5DBJ3V6PA3TY5QSGBAVCNFSM6AAAAABGGGWPDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXGYZTSNJSGA . You are receiving this because you were mentioned.Message ID: @.***>

mk-devc commented 2 months ago

I've experienced the same issue as well.

flaviobrienza commented 2 months ago

Did you solve it?

Il Ven 19 Apr 2024, 05:29 Mohan Kumar @.***> ha scritto:

I've experienced the same issue as well.

— Reply to this email directly, view it on GitHub https://github.com/Unstructured-IO/unstructured/issues/2888#issuecomment-2065683869, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXP3KPLLGZNYV4OEZAJQKCDY6CFSJAVCNFSM6AAAAABGGGWPDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRVGY4DGOBWHE . You are receiving this because you were mentioned.Message ID: @.***>

mk-devc commented 2 months ago

Nope just that I was able to run it on google colab.