ProsusAI / finBERT

Financial Sentiment Analysis with BERT
Apache License 2.0
1.45k stars 417 forks source link

DataFrame id overlap #9

Closed ushmau5 closed 4 years ago

ushmau5 commented 4 years ago

On line 628 of finbert.py you use result = pd.concat([result,batch_result]) when it should be result = pd.concat([result,batch_result], ignore_index=True).

In your result DataFrame the when you concatenate multiple batches together you will have id's that are the same. e.g. 2 batches of 3 items the indexes in result will be 0,1,2,0,1,2.

If you were to convert the DataFrame to a dictionary the results override each other as multiples keys of the same value exist.

doguaraci commented 4 years ago

Thanks for pointing that out! You're correct, we should ignore index. Just fixed it.