ProsusAI / finBERT

Financial Sentiment Analysis with BERT
Apache License 2.0
1.45k stars 417 forks source link

Error running the configuring parameters cell #14

Closed bernardmizzi closed 4 years ago

bernardmizzi commented 4 years ago

Good morning,

I am running the configuring parameters cell and I am getting the below error:


UnpicklingError Traceback (most recent call last)

in 5 pass 6 ----> 7 bertmodel = BertForSequenceClassification.from_pretrained(lm_path,cache_dir=None, num_labels=3) 8 9 ~/anaconda3/envs/finbert/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 601 if state_dict is None and not from_tf: 602 weights_path = os.path.join(serialization_dir, WEIGHTS_NAME) --> 603 state_dict = torch.load(weights_path, map_location='cpu') 604 if tempdir: 605 # Clean up temp dir ~/anaconda3/envs/finbert/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args) 385 f = f.open('rb') 386 try: --> 387 return _load(f, map_location, pickle_module, **pickle_load_args) 388 finally: 389 if new_fd: ~/anaconda3/envs/finbert/lib/python3.7/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args) 562 f.seek(0) 563 --> 564 magic_number = pickle_module.load(f, **pickle_load_args) 565 if magic_number != MAGIC_NUMBER: 566 raise RuntimeError("Invalid magic number; corrupt file?") UnpicklingError: invalid load key, 'v'. Moreover, can you kindly explain how I can construct the files train.csv, validation.csv, test.csv? Regards, Bernard
cskksdfklpz commented 4 years ago

Make sure you successfully downloaded the language model (modes/language_model/finbertTRC2/pytorch_model.bin should be about 400 MB). Try to use git-lfs or directly download the model from GitHub webpage.

bernardmizzi commented 4 years ago

Thanks for your feedback.

Moreover, how I can construct the files train.csv, validation.csv, test.csv?

bernardmizzi commented 4 years ago

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

davidifshk commented 4 years ago

image Hi all, I also met the problem when I ran the configuring parameters cell. I'm trying to download the pytorch_model.bin with git-lfs then but getting this error. It seems a service limit. Kindly be asked for any helps. Great Thanks!

bernardmizzi commented 4 years ago

Take a look at this https://github.com/ProsusAI/finBERT/issues/8

davidifshk commented 4 years ago

image Thanks for your help! But after running wget https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin the file I got is also the size 134kb one not the original 400Mb one.

l0rem1psum commented 4 years ago

image Thanks for your help! But after running wget https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin the file I got is also the size 134kb one not the original 400Mb one.

When I did a git lfs pull, it tells me that:

"batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access. error: failed to fetch some objects from 'https://github.com/ProsusAI/finBERT.git/info/lfs'"

This is probably related to this issue.

bernardmizzi commented 4 years ago

You could manually download https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin from the browser, that's what I did

bernardmizzi commented 4 years ago

image Thanks for your help! But after running wget https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin the file I got is also the size 134kb one not the original 400Mb one.

Did you try manually downloading the file from the browser from https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin? It worked for me and the downloaded file is approximately 400MB

l0rem1psum commented 4 years ago

You could manually download https://github.com/ProsusAI/finBERT/raw/master/models/language_model/finbertTRC2/pytorch_model.bin from the browser, that's what I did

Can you share your local copy of the model file? This method no longer works due to GitHub bandwidth restrictions. I can download the file but it's only 134 bytes. Thank you

bernardmizzi commented 4 years ago

Yeah sure

https://drive.google.com/drive/folders/1Y7pS_P4Bui7pZXKp04aPjb1c62CQUZT2?usp=sharing

ok?

bernardmizzi commented 4 years ago

@davidifshk you can also use my link if you want ^

l0rem1psum commented 4 years ago

@bernardmizzi Thank you. This is going to benefit more people with the same issue.

bernardmizzi commented 4 years ago

No problem, glad I could help

l0rem1psum commented 4 years ago

@bernardmizzi

Sorry to ask again, but could you please also share the model under classifier_model/finbert-sentiment. I believe that could not be downloaded as well. Really appreciate your help!

bernardmizzi commented 4 years ago

That model is created when trained on certain text, you'll have to run the notebook finBERT/notebooks/finbert_training.ipynb as mine is trained on certain text. If you want i'll give you mine but it is trained on reddit news headlines and obviously it reported very low accuracy.

l0rem1psum commented 4 years ago

That's okay. Thank you very much!

bernardmizzi commented 4 years ago

Should you need help with running the notebook just send me a message as I got it up and running.

davidifshk commented 4 years ago

@davidifshk you can also use my link if you want ^

It works! Thank you very much! I'm going to run the training with the dataset from FinancialPhraseBank first.

davidifshk commented 4 years ago

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

Kindly be asked for the data structure of train.csv that I got an error when ran the cell 'get_data()'. Here is the data structure of my train.csv. Is there anything wrong? image

davidifshk commented 4 years ago

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

Kindly be asked for the data structure of train.csv that I got an error when ran the cell 'get_data()'. Here is the data structure of my train.csv. Is there anything wrong? image

fixed. I used wrong sep character ',' to export csv file

bernardmizzi commented 4 years ago

@davidifshk I wan't able to run the model on the PhaseBank Dataset as I was getting encoding errors on both windows and ubuntu systems. Thus I opted for another dataset.

davidifshk commented 4 years ago

ic, I have already run the model on the PhaseBank Dataset that result is shown below.

image

bernardmizzi commented 4 years ago

@davidifshk would it be a problem to provide me the code you used to open and format the PhraseBank dataset as I was getting encoding errors?

saishashank85 commented 4 years ago

Im trying to use finbert for classification of new articles into several different categories in the banking domain . Which model should i use for classification . Natual language model or the classification model . Thanks.

bernardmizzi commented 4 years ago

You have to run the notebook FinBERT/notebooks/finbert_training.ipynb which will train the language model, then it will create a new classification model, which then, will continuing running the notebook, will use it for classification

jarasny commented 4 years ago

@bernardmizzi Your link to model from google drive has expired, can you re-upload it please? When trying to download model from repository I get error:

This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

bernardmizzi commented 4 years ago

https://drive.google.com/drive/folders/19j9gFJnDDEH5qebrB-Om5lduqg0zr-Is?usp=sharing

metodj commented 4 years ago

Thanks a lot @bernardmizzi ! Could you upload also the sentiment model weights?

bernardmizzi commented 4 years ago

The model is already pre-trained and can be used. I think the model weights are embedded within the model. To run finbert, all you need s the pythorch model bin file and its config.

metodj commented 4 years ago

Indeed, weights are embedded within a model. It's just that there are 2 different models on this repo, one is language model and one is sentiment model (see picture below). On your drive you uploaded the language model, could you upload the sentiment model too? Thanks!

image

bernardmizzi commented 4 years ago

You'll have to run the notebook finbert_training.ipynb since the model you are asking for is fine-tuned (trained) on a certain dataset, and that depends on which dataset you want

clone95 commented 4 years ago

I actually need it fine-tuned on financial news, so if you can upload the fine-tuned version of the sentiment-analysis one, I'd be glad! Thank you anyway.

metodj commented 4 years ago

@bernardmizzi you're right, didn't went carefully enough through the read me to notice that. Thanks for your help! @clone95 I will fine-tune the model for the sentiment analysis in the following days and can then upload that version

akmalsabri commented 4 years ago

Apologies if I was clear, but my main question is how to retrieve the train, validation and test data and put it in those files?

Hi, how to settle this issue?