AmenRa / retriv

A Python Search Engine for Humans 🥸
MIT License
174 stars 20 forks source link

[BUG] Segmentation fault (core dumped) #34

Open celsofranssa opened 8 months ago

celsofranssa commented 8 months ago

First of all, thank you for this excellent library.

Describe the bug

Building TDF matrix: 100%|███████████████████████████████████████████████| 13905/13905 [00:34<00:00, 408.07it/s]
Building inverted index: 100%|███████████████████████████████████████| 148864/148864 [00:10<00:00, 14750.18it/s]
Batch search:   0%|                                                                   | 0/13905 [00:00<?, ?it/s]
Segmentation fault      (core dumped)

I am getting Segmentation fault (core dumped) when using bsearch in Sparse Retriever.

Current environment * CUDA: - GPU: - NVIDIA GeForce RTX 3090 - available: True - version: 12.1 * Packages: - absl-py: 2.0.0 - accelerate: 0.24.1 - aiohttp: 3.8.6 - aiosignal: 1.3.1 - alembic: 1.12.1 - antlr4-python3-runtime: 4.9.3 - appdirs: 1.4.4 - async-timeout: 4.0.3 - attrs: 23.1.0 - autofaiss: 2.15.8 - beautifulsoup4: 4.12.2 - bleach: 6.1.0 - cachetools: 5.3.2 - cbor: 1.0.0 - cbor2: 5.5.1 - certifi: 2023.7.22 - charset-normalizer: 3.3.2 - click: 8.1.7 - colorlog: 6.7.0 - contourpy: 1.2.0 - cramjam: 2.7.0 - cycler: 0.12.1 - dill: 0.3.7 - docker-pycreds: 0.4.0 - embedding-reader: 1.5.1 - faiss-cpu: 1.7.4 - fastparquet: 2023.10.1 - filelock: 3.13.1 - fire: 0.4.0 - fonttools: 4.44.0 - frozenlist: 1.4.0 - fsspec: 2023.10.0 - gitdb: 4.0.11 - gitpython: 3.1.40 - google-auth: 2.23.4 - google-auth-oauthlib: 1.1.0 - greenlet: 3.0.1 - grpcio: 1.59.2 - huggingface-hub: 0.17.3 - hydra-core: 1.3.2 - idna: 3.4 - ijson: 3.2.3 - indxr: 0.1.5 - inscriptis: 2.3.2 - ir-datasets: 0.5.5 - jinja2: 3.1.2 - joblib: 1.3.2 - kaggle: 1.5.16 - keybert: 0.8.3 - kiwisolver: 1.4.5 - krovetzstemmer: 0.8 - lightning-utilities: 0.9.0 - llvmlite: 0.41.1 - lxml: 4.9.3 - lz4: 4.3.2 - mako: 1.3.0 - markdown: 3.5.1 - markdown-it-py: 3.0.0 - markupsafe: 2.1.3 - matplotlib: 3.8.1 - mdurl: 0.1.2 - mpmath: 1.3.0 - multidict: 6.0.4 - multipipe: 0.1.0 - multiprocess: 0.70.15 - networkx: 3.2.1 - nltk: 3.8.1 - nmslib: 2.1.1 - numba: 0.58.1 - numpy: 1.26.1 - nvidia-cublas-cu12: 12.1.3.1 - nvidia-cuda-cupti-cu12: 12.1.105 - nvidia-cuda-nvrtc-cu12: 12.1.105 - nvidia-cuda-runtime-cu12: 12.1.105 - nvidia-cudnn-cu12: 8.9.2.26 - nvidia-cufft-cu12: 11.0.2.54 - nvidia-curand-cu12: 10.3.2.106 - nvidia-cusolver-cu12: 11.4.5.107 - nvidia-cusparse-cu12: 12.1.0.106 - nvidia-nccl-cu12: 2.18.1 - nvidia-nvjitlink-cu12: 12.3.52 - nvidia-nvtx-cu12: 12.1.105 - oauthlib: 3.2.2 - omegaconf: 2.3.0 - oneliner-utils: 0.1.2 - optuna: 3.4.0 - orjson: 3.9.10 - packaging: 23.2 - pandas: 1.5.3 - pillow: 10.1.0 - pip: 23.3.1 - protobuf: 4.23.4 - psutil: 5.9.6 - pyarrow: 12.0.1 - pyasn1: 0.5.0 - pyasn1-modules: 0.3.0 - pyautocorpus: 0.1.12 - pybind11: 2.6.1 - pygments: 2.16.1 - pyparsing: 3.1.1 - pystemmer: 2.0.1 - python-dateutil: 2.8.2 - python-slugify: 8.0.1 - pytorch-lightning: 2.1.1 - pytorch-metric-learning: 2.3.0 - pytz: 2023.3.post1 - pyyaml: 6.0.1 - ranx: 0.3.18 - regex: 2023.10.3 - requests: 2.31.0 - requests-oauthlib: 1.3.1 - retriv: 0.2.3 - rich: 13.6.0 - rsa: 4.9 - safetensors: 0.4.0 - scikit-learn: 1.3.2 - scipy: 1.11.3 - seaborn: 0.13.0 - sentence-transformers: 2.2.2 - sentencepiece: 0.1.99 - sentry-sdk: 1.39.1 - setproctitle: 1.3.3 - setuptools: 68.2.2 - six: 1.16.0 - smmap: 5.0.1 - soupsieve: 2.5 - sqlalchemy: 2.0.23 - sympy: 1.12 - tabulate: 0.9.0 - tensorboard: 2.15.1 - tensorboard-data-server: 0.7.2 - termcolor: 2.3.0 - text-unidecode: 1.3 - threadpoolctl: 3.2.0 - tokenizers: 0.14.1 - torch: 2.1.0 - torchaudio: 2.1.0 - torchmetrics: 1.2.0 - torchvision: 0.16.0 - tqdm: 4.66.1 - transformers: 4.35.0 - trec-car-tools: 2.6 - triton: 2.1.0 - typing-extensions: 4.8.0 - unidecode: 1.3.7 - unlzw3: 0.2.2 - urllib3: 2.0.7 - wandb: 0.16.1 - warc3-wet: 0.2.3 - warc3-wet-clueweb09: 0.2.5 - webencodings: 0.5.1 - werkzeug: 3.0.1 - wheel: 0.41.2 - yarl: 1.9.2 - zlib-state: 0.1.6 * System: - OS: Linux - architecture: - 64bit - ELF - processor: x86_64 - python: 3.10.13 - release: 5.15.0-88-generic - version: #98~20.04.1-Ubuntu SMP Mon Oct 9 16:43:45 UTC 2023
MarshtompCS commented 8 months ago

I had this issue before, and the reason is the query was too long in my experiment