Networks-Learning / stackexchange-dump-to-postgres

Python scripts to import StackExchange data dump into Postgres DB.
MIT License
83 stars 29 forks source link

Error when running python load_into_pg.py -d stackoverflow -s pgload #15

Open flfilip opened 3 years ago

flfilip commented 3 years ago

Hi Team,

Could you please help me with the below error when trying to run python load_into_pg.py? Am I missing something?

Traceback (most recent call last): File "load_into_pg.py", line 413, in import libarchive File "/home/pgadminuser/.local/lib/python2.7/site-packages/libarchive/init.py", line 1, in from .entry import ArchiveEntry File "/home/pgadminuser/.local/lib/python2.7/site-packages/libarchive/entry.py", line 6, in from . import ffi File "/home/pgadminuser/.local/lib/python2.7/site-packages/libarchive/ffi.py", line 108, in errno = ffi('errno', [c_archive_p], cint) File "/home/pgadminuser/.local/lib/python2.7/site-packages/libarchive/ffi.py", line 95, in ffi f = getattr(libarchive, 'archive'+name) File "/usr/lib/python2.7/ctypes/init.py", line 379, in getattr func = self.getitem(name) File "/usr/lib/python2.7/ctypes/init.py", line 384, in getitem func = self._FuncPtr((name_or_ordinal, self))

When running the below command it all seems ok.

pip install -r requirements.txt Collecting argparse==1.2.1 (from -r requirements.txt (line 1)) Collecting distribute==0.6.24 (from -r requirements.txt (line 2)) Collecting libarchive-c==2.9 (from -r requirements.txt (line 3)) Using cached https://files.pythonhosted.org/packages/23/16/622ae829e9c1795479df865bbcbb4e7e3990f3e451e440f00bf1615be7fc/libarchive_c-2.9-py2.py3-none-any.whl Collecting lxml==4.5.2 (from -r requirements.txt (line 4)) Using cached https://files.pythonhosted.org/packages/d1/2d/642ef7013aa56af52e14b5b7d53c5d591e6d038c9688e06d0f2a20ed26b2/lxml-4.5.2-cp27-cp27mu-manylinux1_x86_64.whl Collecting psycopg2-binary==2.8.4 (from -r requirements.txt (line 5)) Using cached https://files.pythonhosted.org/packages/97/2a/b854019bcb9b925cd10ff245dbc9448a82fe7fdb40127e5cf1733ad0765c/psycopg2_binary-2.8.4-cp27-cp27mu-manylinux1_x86_64.whl Collecting six==1.10.0 (from -r requirements.txt (line 6)) Using cached https://files.pythonhosted.org/packages/c8/0a/b6723e1bc4c516cb687841499455a8505b44607ab535be01091c0f24f079/six-1.10.0-py2.py3-none-any.whl Installing collected packages: argparse, distribute, libarchive-c, lxml, psycopg2-binary, six Successfully installed argparse-1.2.1 distribute-0.6.24 libarchive-c-2.9 lxml-4.6.2 psycopg2-binary-2.8.4 six-1.10.0

Thank you, Florin

musically-ut commented 3 years ago

It looks like a libarchive installation issue; that it couldn't probably find archive_7z for loading. The first line of the Traceback should contain the exact error.

Are you trying to run this on Windows or OSX?

If so, getting it to run inside Docker maybe easier than trying to install the dependencies directly.

flfilip commented 3 years ago

Hi,

Thank you for your message. No, I am running inside ubuntu machine. Tried on multiple VMs, getting same error. Could you please provide me the steps on how to run it inside of a container? Florin

musically-ut commented 3 years ago

Curious.

On your Ubuntu machine, can you try:

sudo apt-get install libarchive-dev

and then try running the script again? If that fails, it may be necessary to give an explicit path to libarchive*.so files via the LD_LIBRARY_PATH environment variable while running the script.

I personally do not run it in a container because I'm able to install all the dependencies.