amirmohammadkz / personality_detection

BB-SVM model for automatic personality detection of the essays dataset (Big-Five personality labeled traits)
https://sentic.net/personality-detection-using-bagged-svm-over-bert.pdf
MIT License
29 stars 8 forks source link

Having problems at step 4 #4

Open kkkkangx opened 3 years ago

kkkkangx commented 3 years ago

Hi,

I followed the instructions, install all corresponding packages and python version, then complete the step1-3. But when I tried to run the step 4, the error occurred. (as mentioned in other issues)

图片

I searched google and changed the n_job=-1 to n_job=1 in svm.py. Now the code is running, but it cost much time, significantly longer than 7 minuates described in readme.md.

How can I fix this question? Any help will be appreciated!

amirmohammadkz commented 3 years ago

Hello, Thanks for using our repository. the n_job is for running in parallel. -1 means it uses all your processors to get the result as fast as possible. If you change it, you are reducing the speed of the classifier, and that is why your result takes much more than 7 minutes.

I just tested the whole code again on a new laptop and used python 3.7 with a new virtual environment, and installed all the requirements. I did not face that issue. Are you using something different (different versions or another distribution of python such as Anaconda)?

Arkhemis commented 3 years ago

Hi,

Just faced the same issue, with Python 3.7.6 and pip env. I'll change the n_job parameter to see how it goes.

EDIT: Ok, apparently the issue is caused by IPython, so basically PyCharm. When running in the Windows Powershell and with the n_job parameter unchanged (so =-1), it does not encounter any issues (I'm actually running it, will confirm once it's done).

Arkhemis commented 3 years ago

A quick update but the script hasn't finished running (it's been ~10h now), despite having a more than correct CPU (i7-4790k, 4GHz). What could be the explanation?

amirmohammadkz commented 3 years ago

Hi, Are you getting the accuracy as print output? The pushed version of model is predicting all 5 traits and using cross-validation for each of them to report the final result. Hence, the final result gets ready whenever that finished. If you want to test it faster, in the training loop, just use 1 cv instead of 10-fold cross validation. Let me know if the problem still exists.

Arkhemis commented 3 years ago

I was actually using my own dataset (for which I had a specific personality for each line (i.e: users) and related mairesse csv features) and followed all the steps in the ReadMe.

This file is around ~4000 lines long, so basically the double of the essays.csv . When running the python svm_result_calculator.py with it, I had some accuracy output, but every 4 hours or so. The fact that my file is just twice the size in terms of rows of the essays.csv should not explain the impressive increase in waiting time.

As of now, I'm trying to see what is happening when running the traditional steps (so using the essays.csv), and will see how much time it takes, to see if that's due to my csv, or another common problem.

Arkhemis commented 3 years ago

A quick update, but the problem seems to remain, even with the essays.csv. The process seems faster (I'm seeing more print output of the accuracy and the Bagging SVC (around 5minutes per BaggingSVC), but it's been 1-hour since I started the script, and it keeps running.

EDIT: I just finished the first Y, so for the 5 traits, it should last around ~7h. I do not understand why is it taking so long to be honest.

amirmohammadkz commented 3 years ago

When you say around 5 minute per bagging svc, it means that the model is working fine. We are using 10 fold cross validation so our result be more reliable. If you do not need it, just pick one of the cvs as the test and the rest for the training. Plus, you can change the bagging svm to regular svm. You may lose a bit of accuracy, but it is not much if you really need the speed. Ultimately, you can also check our newer model: https://github.com/yashsmehta/personality-prediction

kkkkangx commented 3 years ago

When you say around 5 minute per bagging svc, it means that the model is working fine. We are using 10 fold cross validation so our result be more reliable. If you do not need it, just pick one of the cvs as the test and the rest for the training. Plus, you can change the bagging svm to regular svm. You may lose a bit of accuracy, but it is not much if you really need the speed. Ultimately, you can also check our newer model: https://github.com/yashsmehta/personality-prediction

Hi,

I completed all steps and have another question now. How could I save the trained model to predict the personality scores of my own text?