Open belalsalih opened 1 year ago
A segmentation fault shouldn't be happening under any circumstances. Could you post the output of the following command?
pip list
Furthermore, I'd appreciate it if you could you try the following for me:
spacy
(and its automatically installed dependencies ).Thanks for the reply, I have created a new venv with only spacy, however I am still getting the same error so this is not related to pip packages. I am using a small sample data(300 docs) for training and validation.
Noticed one thing: Changing the with in [components.tok2vec.model.encode] from the default 96 to 128 will make the training command complete one iteration then crash, changing this value back to 96 will cause the command to fail without completing any iterations.
Attached debug data output FYR. debug_data.txt
pip list output:
Package Version
------------------- ---------
attrs 23.1.0
azure-core 1.28.0
azure-storage-blob 12.17.0
blis 0.7.9
catalogue 2.0.8
certifi 2023.5.7
cffi 1.15.1
charset-normalizer 3.2.0
click 8.1.5
confection 0.1.0
contourpy 1.1.0
cryptography 41.0.2
cycler 0.11.0
cymem 2.0.7
en-core-web-lg 3.6.0
en-core-web-sm 3.6.0
fonttools 4.41.1
fuzzysearch 0.7.3
fuzzywuzzy 0.18.0
idna 3.4
importlib-resources 6.0.0
isodate 0.6.1
Jinja2 3.1.2
joblib 1.3.1
kiwisolver 1.4.4
langcodes 3.3.0
Levenshtein 0.21.1
MarkupSafe 2.1.3
matplotlib 3.7.2
murmurhash 1.0.9
numpy 1.24.4
packaging 23.1
pandas 2.0.3
pathy 0.10.2
Pillow 10.0.0
pip 23.2
pkg_resources 0.0.0
preshed 3.0.8
pycparser 2.21
pydantic 1.10.11
pyodbc 4.0.39
pyparsing 3.0.9
python-dateutil 2.8.2
python-Levenshtein 0.21.1
pytz 2023.3
rapidfuzz 3.2.0
regex 2023.8.8
requests 2.31.0
scikit-learn 1.3.0
scipy 1.10.1
setuptools 68.0.0
six 1.16.0
sklearn 0.0.post7
smart-open 6.3.0
spacy 3.6.0
spacy-legacy 3.0.12
spacy-loggers 1.0.4
srsly 2.4.6
thefuzz 0.20.0
thinc 8.1.10
threadpoolctl 3.2.0
tqdm 4.65.0
typer 0.9.0
typing_extensions 4.7.1
tzdata 2023.3
urllib3 2.0.3
wasabi 1.1.2
wheel 0.40.0
zipp 3.16.2
Thanks for the info - We'll investigate.
To anyone facing this issue, I've used NER instead SpanCat and I had no issues. And for overlapping spans I've trained the model to extract the high level details and trained separate models to extract sub-details from complex data. I still believe SpanCat is the right way to do it if it worked as intended.
Regards.
Hi, can you share the training/dev data and the custom code you were using to train the SpanCat model? We'd need that to reproduce the crash and debug the issue.
Hi, I got this issue while creating a CV parser for our clients, so unfortunately we cannot share the data since it is using live applicants data. We are not using any custom code to train the model, we are generating the training and saving the training.spacy/dev.spacy on the fly. The same data that is causing this error is working fine when using NER instead of SapnCat, so I don't think this is data issue as you can see in the debug data shared earlier. You can check this discussion thread related to this issue 13012.
Regards.
That's understandable. The issue is likely a bug in the SpanCat component's code, but we still need to consistently reproduce the crash in order to identify the cause and fix it. If you run into this issue in the future where you can share the data that triggers the crash, please let us know.
Hi, I am getting 'Segmentation fault (core dumped)' when trying to train model for long SpanCat. I know this error could be related to OOM issues but this does not seem the case here. I tried to reduce [nlp] batch_size and [training.batcher.size] as shown in the attached config file and used a VM with very large RAM to make sure we are not running out of memory. During training the VM memory usage never goes above 40% and even when reducing the [components.spancat.suggester] min_size and max_size the memory usage does not exceed 20% but the training exits with error 'Segmentation fault (core dumped)'.
Note: when training with low [components.spancat.suggester] values the training completes but with all zeroes for F, P and R.
His is the command I am using for training: python -m spacy train config_spn.cfg --output ./output_v3_lg_1.3 --paths.train ./spacy_models_v3/train_data.spacy --paths.dev ./spacy_models_v3/test_data.spacy --code functions.py -V
This is the training output:
[2023-09-28 09:25:08,461] [DEBUG] Config overrides from CLI: ['paths.train', 'paths.dev'] ℹ Saving to output directory: output_v3_lg_1.3 ℹ Using CPU
=========================== Initializing pipeline =========================== [2023-09-28 09:25:08,610] [INFO] Set up nlp object from config [2023-09-28 09:25:08,618] [DEBUG] Loading corpus from path: spacy_models_v3/test_data.spacy [2023-09-28 09:25:08,618] [DEBUG] Loading corpus from path: spacy_models_v3/train_data.spacy [2023-09-28 09:25:08,619] [INFO] Pipeline: ['tok2vec', 'spancat'] [2023-09-28 09:25:08,621] [INFO] Created vocabulary [2023-09-28 09:25:09,450] [INFO] Added vectors: en_core_web_lg [2023-09-28 09:25:09,450] [INFO] Finished initializing nlp object [2023-09-28 09:25:16,150] [INFO] Initialized pipeline components: ['tok2vec', 'spancat'] ✔ Initialized pipeline
============================= Training pipeline ============================= [2023-09-28 09:25:16,158] [DEBUG] Loading corpus from path: spacy_models_v3/test_data.spacy [2023-09-28 09:25:16,159] [DEBUG] Loading corpus from path: spacy_models_v3/train_data.spacy ℹ Pipeline: ['tok2vec', 'spancat'] ℹ Initial learn rate: 0.001 E # LOSS TOK2VEC LOSS SPANCAT SPANS_SC_F SPANS_SC_P SPANS_SC_R SCORE
0 0 98109.47 19535.08 0.00 0.00 4.58 0.00 0 200 528.73 781.51 0.00 0.00 3.75 0.00 Segmentation fault (core dumped)
Environment:
Operating System: Ubuntu 20.04.6 LTS Python Version Used: 3.8.10 spaCy Version Used: 3.6.0 config_spn.cfg.txt
Thanks in advance!