Closed HarshKhadloya closed 4 years ago
Thanks for raising this issue, @wochinge will get back to you about it soon.
@HarshKhadloya Do you have python 32bit / 64 bit installed?
@HarshKhadloya Do you have python 32bit / 64 bit installed?
@wochinge Its 64-bit.
platform.architecture() : ('64bit', 'WindowsPE')
Can you do is_64bits = sys.maxsize > 2**32
in your python shell?
The question is not whether Windows is 32bit or 64bit, but whether you have got the python binaries installed for 32bit or 64bit.
@wochinge
Well, the output is TRUE
for sys.maxsize > 2**32
Let me know if you need any further details.
Thanks for the result of the command! :-) Can you please check how much memory the python process is using when the MemoryError is happening?
Hi @wochinge Its using only around 1.5 GB..
Hi, is anyone able to understand whats happening here? Any updates would be really helpful. Thanks.
@HarshKhadloya Do you have multiple Python versions installed? Since the Python process exactly consumes 2 GiB of memory when it is crashing, I assume that you are somehow using a Python 32bit version. Are you using a virtualenv?
@wochinge No, I dont have multiple versions installed. I just have 1 version installed via Anaconda. Some additional information: Since Rasa was not compatible (I was encountering issues) with the latest python version of 3.7, I uninstalled Anaconda & installed a 3.5 version. I am doing this work on an EC2 instance. Let me know if further info is reqd.
we've been experiencing some memory errors ourselves, it might just be that the array it's about to create would be too big to fit into memory. The point where it breaks is when it's converting a scipy sparse array into a numpy array -- the numpy array is much bigger than the scipy sparse array which is probably what's causing that. We don't really have a quick fix for that right now, but may be merging a fix for that in future as we're working on optimising training for the tensorflow pipeline ourselves
@akelad is it possible to split the .md training data and train it separately but somehow append it to one model in the end ? because i'm experiencing the same thing using tensorflow embedding config. Thanks in advance.
@akelad I had the same hypothesis given the point where it breaks. Thanks for your response & hoping the Rasa team will be fixing this in future as working with Rasa module has been very helpful. I ended up building an independent classification model (fine for my use case) & will be using Rasa for entity extraction (better usability than CRF).
@kenzydarioo A workaround would be to manually split your data and build sequential models for additional intents. These additional intents can be tagged as 'Others' in the previous model - As the 'MemoryError' seems majorly due to the # of intents. For example, I was able to create a model with 200k training data with just 5 intents (though most of the data were duplicates). Let us know on the approach which worked for you!
@HarshKhadloya is there any entity in your training data ? Can you tell me how to build sequential models and then combine them into one model ?
@adirizka7 Yes, I also have entities in my training data. I was suggesting to use the sequential models sequentially, & not combine them into one. Just as a crude example, lets say there are 5 intents - A, B, C, D & E. In the first model, we will be predicting A, B & Others (which will have C, D & E). The second model will be predicting C, D & E. The second model is to be scored on the data which were predicted as 'Others' from the first model. This might be handy if you have large number of intents. For smaller numbers, like in the example, a single Rasa model should work.
@HarshKhadloya thanks for the tips, but is there any other way to do this without creating multi agent? like combining into one model from separate models?
i'm gonna leave this open, because it is an issue and we are looking into it
Thanks @akelad
@kenzydarioo It really depends on your use case. I can share my thoughts if I know a background of what you are trying to achieve. In my case, as mentioned, I was able to create an independent classification model - regular LR/SVM on the DTM of my dataset.
I have the same problem with MemoryError and can't train my model using tensorflow_embedding on a big training set. As a workaround, I train model only on a small training set.
yes, I have the same issue with around 90k rows of the dataset with 144 intents, so how do you guys solve it while waiting for rasa team to fix this problem?
allocate more memory to the machine... I'm afraid there's no work around just yet, our fix is still a work in progress
Same problem here with 8k intent and 1 - 4 common_examples for each.. 50Gib memory allocated, only 2Gib used when failed..
Using Docker on Ubuntu 18.04 (FROM python:3.6.8-slim-stretch)
rasa-config.yml :
language: "fr"
pipeline:
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
intent_tokenization_flag: true
intent_split_symbol: "+"
From Python console :
>>> import sys
>>> is_64bits = sys.maxsize > 2**32
>>> print(is_64bits)
True
From docker stats :
CONTAINER ID | NAME | CPU % | MEM USAGE / LIMIT | MEM % | NET I/O | BLOCK I/O | PIDS |
---|---|---|---|---|---|---|---|
494fdf122804 | rasanlu_python_prod_1 | 0.01% | 2.182GiB / 50GiB | 4.36% | 13.6MB / 135kB | 0B / 54.2MB | 82 |
CPU peak at 2365% (24 cores), 50 Gib never reached (no difference with 120Gib)
Error from logs :
Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/site-packages/rasa_nlu/train.py", line 174, in <module>
num_threads=cmdline_args.num_threads)
File "/usr/local/lib/python3.6/site-packages/rasa_nlu/train.py", line 149, in do_train
interpreter = trainer.train(training_data, **kwargs)
File "/usr/local/lib/python3.6/site-packages/rasa_nlu/model.py", line 190, in train
**context)
File "/usr/local/lib/python3.6/site-packages/rasa_nlu/classifiers/embedding_intent_classifier.py", line 446, in train
training_data, intent_dict)
File "/usr/local/lib/python3.6/site-packages/rasa_nlu/classifiers/embedding_intent_classifier.py", line 272, in _prepare_data_for_training
all_Y = self._create_all_Y(X.shape[0])
File "/usr/local/lib/python3.6/site-packages/rasa_nlu/classifiers/embedding_intent_classifier.py", line 256, in _create_all_Y
all_Y = np.stack([self.encoded_all_intents for _ in range(size)])
File "/usr/local/lib/python3.6/site-packages/numpy/core/shape_base.py", line 423, in stack
return _nx.concatenate(expanded_arrays, axis=axis, out=out)
MemoryError
Okay, seems to be a different problem because it doesn't fail on same method (concatenate()
in my example, zeros()
in the original bug submission from @HarshKhadloya ).
But it seems to be related to Numpy in all case...
RAM isn't full when it fails, so I have a question ; does Numpy evaluate the memory amount that's going to be used BEFORE doing the concatenate ?
So, maybe it's evaluated to more than 120Gb, and it fails before using it, and it could be the reason why we doesn't see the problem ?
Original OP see the same thing ; in his case, 6Gb used on 16Gb (in my case, 2.2Gb on 50 or 120Gb)
Any news / tips on that ? Thanks !
yeah it's because we're using numpy arrays here, which at this point take up a huge amount of memory. Or are about to, as you said. The solution to this is using sparse arrays, which we are in a separate branch which isn't quite ready to be merged yet. We will be merging it in the next few months so you should have no more problems with memory at that point
I have the same issue, could we get this update in a few recent minor releases?
@akelad Thanks!
Still no update on this, sorry!
Since Vectorizer tries to create a vector with all words as features, it can lead to Memory Error on large corpus. You can restrict the max_features. I modified my config file as follows and the issue was resolved:
language: en
pipeline:
- name: CountVectorsFeaturizer
max_features: 1000
- name: EmbeddingIntentClassifier
Since Vectorizer tries to create a vector with all words as features, it can lead to Memory Error on large corpus. You can restrict the max_features. I modified my config file as follows and the issue was resolved:
language: en pipeline: - name: CountVectorsFeaturizer max_features: 1000 - name: EmbeddingIntentClassifier
Well, I just followed your config on CountVectorsFeaturizer but still got a memory error
Since Vectorizer tries to create a vector with all words as features, it can lead to Memory Error on large corpus. You can restrict the max_features. I modified my config file as follows and the issue was resolved:
language: en pipeline: - name: CountVectorsFeaturizer max_features: 1000 - name: EmbeddingIntentClassifier
Well, I just followed your config on CountVectorsFeaturizer but still got a memory error
@alvipranandha Can you make one small change mentioned below and check?
language: en
pipeline:
- name: CountVectorsFeaturizer
max_features: 1000
- name: EmbeddingIntentClassifier
intent_tokenization_flag: true # Since you have multiple intents
batch_size: [32, 64] # Default is [64, 256]. Larger batch sizes occupy more memory
Let me know how it goes
@alvipranandha Can you make one small change mentioned below and check?
language: en pipeline: - name: CountVectorsFeaturizer max_features: 1000 - name: EmbeddingIntentClassifier intent_tokenization_flag: true # Since you have multiple intents batch_size: [32, 64] # Default is [64, 256]. Larger batch sizes occupy more memory
Let me know how it goes
Thank you for your config, now I can run it without memory error with around 95k rows then around 144 intents and around 28 entities. But the result is still not good, need the best tune hyperparameters for a custom dataset.
Thats awesome. You can try changing the max_features
size and look for results. I have 8 core system with 16 GB RAM and my value for max_features
was 5000
Alternatively you can develop a custom featurizer using TFIDF Vectorizer. And set max features to whatever fits in your memory. TFIDF may help boost the results.
Well, thank you for your suggestion @sagardawda7. I have a quad-core system with 16 GB RAM and still searching for the best config for our dataset. How do I know to test or check the condition for max_features, in order not in a memory error condition? Is it check one by one? or there are other methods?
@akelad As mentioned above, team is trying to replace numpy with something else. Do we have any update? what is the branch name where i can find the fix? I would like to see the solution if it is yet to be merged. please let me know.
@Ghostvv could you update everyone on the latest status of this?
we have a branch, where we're using sparse matrices instead of dense numpy arrays, but it is implemented for the new architecture that we're working on. @tabergma could you please link the branch here
We have two branches:
CountVectorsFeaturizer
. The features are then used in the EmbeddingIntentClassifier
. However, the code is not cleaned up. https://github.com/RasaHQ/rasa/tree/entity-recognitionHi guys. Iam facing the same memory issue. Is the new branch ready.Can i use the branch "https://github.com/RasaHQ/rasa/tree/combined-entity-intent-model." or should I use the latest RASA git version??
@suryavamsi1563 The branch https://github.com/RasaHQ/rasa/tree/combined-entity-intent-model is not ready yet. We faced some issues on the way. You should be able to use it beginning of next week.
I have training data with the following characteristics:
- intent examples: 11263 (2 distinct intents)
- Found intents: 'general', 'irrelevant'
- Number of response examples: 0 (0 distinct response)
- entity examples: 9407 (22 distinct entities)
- found entities: '', 'company', 'amount_price_target', 'analyst', 'financial_topic', 'financial_instrument', 'period', 'person', 'price_movement', 'hashtag', 'publication', 'ticker', 'amount', 'percent', 'number', 'media_type', 'location', 'rating_agency', 'event', 'exchange', 'product', 'sector'
When I run the command
rasa test nlu --config pretrained_embeddings_spacy.yml supervised_embeddings.yml --nlu CF_model/config_en.json --runs 3 --percentages 0 25 50 70 90
I get memory error, any ideas how to solve this?
I have training data with the following characteristics:
- intent examples: 11263 (2 distinct intents) - Found intents: 'general', 'irrelevant' - Number of response examples: 0 (0 distinct response) - entity examples: 9407 (22 distinct entities) - found entities: '', 'company', 'amount_price_target', 'analyst', 'financial_topic', 'financial_instrument', 'period', 'person', 'price_movement', 'hashtag', 'publication', 'ticker', 'amount', 'percent', 'number', 'media_type', 'location', 'rating_agency', 'event', 'exchange', 'product', 'sector'
When I run the command
rasa test nlu --config pretrained_embeddings_spacy.yml supervised_embeddings.yml --nlu CF_model/config_en.json --runs 3 --percentages 0 25 50 70 90
I get memory error, any ideas how to solve this?
Were you able to solve it?
No, unfortunately, I did not. Any suggestions?
@igormis Did you used the latest Rasa version? I assume --config pretrained_embeddings_spacy.yml
corresponds to pipeline: "pretrained_embeddings_spacy"
? Is the model training if you do just a single training run with rasa train nlu
? When exactly is the memory error occurring, when loading the data, during training, during evaluation?
No, unfortunately, I did not. Any suggestions?
I was able to solve my problem. Actually I was running both, Rasa x and Rasa run simultaneously which was creating memory problem. So I closed rasa x then rasa run started like a charm.
Hi @tabergma , is the sparse arrays branch 'https://github.com/RasaHQ/rasa/tree/entity-recognition' ready for use ?. I am facing a memory error and desperately need a work around or a solution.
@suryavamsi1563 We merged sparse features into master and released it with Rasa 1.6.0. So, just use the latest Rasa version. Let me know if you are still running into issues.
I am going to close this as it should be fixed with 1.6. If there are still issues, please create a new issue
As mentioned in the title, I am feeding in ~73k lines of training data classified into 38 intents. And I would end up using ~200k lines of messages to create my final model. But even for 73k, I get a MemoryError. This doesn't seem to be a RAM issue as I don't see my RAM getting fully used up while running the training code. Any inputs would be valuable. Below are the details:
Rasa NLU version: 0.13.8 Operating system: Windows Server 2016
Training the model as:
Content of model configuration file:
Output / Issue:
During this runtime, I dont see my RAM getting used up more than 6GB, even though I have a 16GB RAM. Thanks for your help!