facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.94k stars 4.72k forks source link

Gender detection #336

Open historylife opened 7 years ago

historylife commented 7 years ago

Dear all

I was advised to use fastText to handle gender detection from giving first name.

I have more than 800k labeled name with male and female label.

Wht is the best confirmation i can do.

Ps : i am new to this. Please help me out with step-by-step.

Regards

spate141 commented 7 years ago

Amm... http://lmgtfy.com/?q=Learning+a+text+classifier+using+fastText

Edit: You can check this tutorial first: Learning a text classifier using fastText

historylife commented 7 years ago

First of all @spate141 thank you

Well , i came form WIT as one of WIT CEO advice me to , and he side this fastText can be more accurate than wit with Gender classification.

Ok , i did install this lib , get ny names database , Database

then : ./fasttext supervised -input names.txt -output model_gender -lr 1.0 -epoch 300

TEST

./fasttext test model_gender.bin names.txt                                         N       94777
P@1     0.96
R@1     0.96
Number of examples: 94777

Now i was disappointed with results when passing a new name which is not available in the database echo horoshimao | ./fasttext predict model_gender.bin - echo ablalrahman | ./fasttext predict model_gender.bin - echo bestman | ./fasttext predict model_gender.bin - all return female but is is male

while WIT 99% return correct value! the disappointing based on my test between fastText and wit.

So i am sure there is something not correct with my training !

Kindly may you advice :)

spate141 commented 7 years ago

@historylife Well, again.. this is just a basic hyperparameter selection I did which seems like working correctly for your case.

./fasttext supervised -input ~/Desktop/names.txt -output model_gender -lr 0.025 -epoch 70 -minn 2 -maxn 5 -loss ns -thread 4
./fasttext test model_gender.bin ~/Desktop/names.txt
N   94777
P@1 0.965
R@1 0.965
Number of examples: 94777
./fasttext predict-prob model_gender.bin - 2
horoshimao
__label__male 0.871094 __label__female 0.126953
ablalrahman
__label__male 0.996094 __label__female 0.00195314
bestman
__label__male 0.998047 __label__female 1.95313e-08
myoldusername commented 7 years ago

@spate141 @hisstorylife Wonderful topic,

Any way i like to ask @spate141 What is the special thing with the parameters you used with supervised, and how it predicted with more accuracy. Is there any tweaks to make accuracy even better?

I test your settings and it returns more subtle results. On other hand, i note when we ask fastText to predict, it is take upto 5 seconds to give results back. In general it take long time. And if this set on a large scale, i think it may lead to resorce problems.

I have powerful server, 24 cpu, 128gb of ram. I am talking about same @hisstorylife example.

Other thing which is not clear with this fastText lib, if i have a new set of learning data, how can i add this data to the original module, without relearning all database which may take long time.

spate141 commented 7 years ago

@myoldusername

myoldusername commented 7 years ago

May you please let me know about how to Keep the model loaded in memory

I am sorry if i bothered you... Regards

spate141 commented 7 years ago

Well, there are many ways you can do that! Depends on which programming language you are using. For more detail check out these fastText communities. With default C++ code, you can use ./fasttext predict-prob model.bin - k in command line, where "-" means you are giving input one line at a time and "k" is the most likely labels for a piece of text you want. Here model will be loaded first in memory, so all the predictions will not take much time.

myoldusername commented 7 years ago

Wonderful , i am using php , and i want to pass a name from php to ./fasttext predict-prob model.bin -

The problem is , each time user ask for (./fasttext predict-prob model.bin) we will reload the model . Now by your advice , when i run ./fasttext predict-prob model.bin - [ENTER] then each time i type i will have the result very fast , Now how can we tell php to type in this std ! got my problem ? and get results back !

someone advice me to use this methd

create a named pipe with mkfifo, and have that be the input to fasttext. 
then you direct script output to that named pipe.
root@server [~/fasttext]# mkfifo testpipe
root@server [~/fasttext]# ./fasttext predict-prob model_gender.bin testpipe

Now when i pass name to testpipe echo linda > testpipe fasttext print the output and exit , i need it to stay in the memory so any time i can send name to fasttext from nay script or webserver .

cpuhrsch commented 6 years ago

Hello @historylife,

Thank you for your post. You might find more support for this kind of issue within one of our community boards. In particular, the Facebook group has many ML experts who are keen on discussing applications of this library.

Specifically, there is a

Facebook group Stack Overflow tag Google group

If you do decide to move this to one of our other community boards, please consider closing this issue.

Thanks, Christian