Closed chensuim closed 4 years ago
Hi Sui,
I have measured the evaluation metrics you mentioned, but it has been a while since this repository created about one year ago.
I didn't record the results at that time, I'm happy to accept your request and I will add the experiment results within a month(A little busy lately, sorry for that😅).
But if you are urgent to know the performance of all the models(maybe you just want to know what model is the best), I remember in my dataset, the model performance is: CRNN > SANN/HAN/RCNN > CNN > RNN > ANN > FastText (Note that my dataset almost consists of the Chinese words.)
Hope this helps!
Randolph
Thanks a lot! Do you remember the rough performance in number of CRNN?
Sui
@chensuim
In my dataset, the F1 value of CRNN performance is about 0.69.
I suggest that you can try the CRNN and SANN model which performs well in my dataset, in my memory.
Randolph
@RandolphVI Thank you. Is your dataset open source? I tried them on my long tail dataset which can only get f1 at 0.5.
Sui
@chensuim
Sorry, my dataset is not an open source. Have you tried to padding the sentence length? (which I think influence a lot)
If your dataset comprises English words almost and all sentence is large than the 200 words, I will suggest you use the LSTM-based method rather than using the CNN-based approach.
@RandolphVI No worries. I used padding. In my dataset, seq length changed a lot (from 10 words to more than 300 words). I pad the length to 200. I have already tried both lstm and cnn. You are right. Lstm is better but still can only reach f1 at 0.5. I thought it was the result of long tail. Do you have any suggestion about the way to deal with long tail data?
Sui
@chensuim
In this condition, in my opinion, to deal with the long tail you need to design a sampling strategy for remaking the better dataset since it's the problem of the data.
I deal with the dataset just like yours and my solution is cleaning up the data 😂.
@RandolphVI Sorry, I dont know how to clean the data. I thought long tail was a general question for all multilabel questions.
Sui
Hello, did you measure the parameters about your all models? like precision, recall and so on. If yes, could you please share it?
Thanks a lot, Sui