This repository is my bachelor graduation project, and it is also a study of TensorFlow, Deep Learning (CNN, RNN, etc.).
The main objective of the project is to determine whether the two sentences are similar in sentence meaning (binary classification problems) by the two given sentences based on Neural Networks (Fasttext, CNN, LSTM, etc.).
The project structure is below:
.
├── Model
│ ├── test_model.py
│ ├── text_model.py
│ └── train_model.py
├── data
│ ├── word2vec_100.model.* [Need Download]
│ ├── Test_sample.json
│ ├── Train_sample.json
│ └── Validation_sample.json
└── utils
│ ├── checkmate.py
│ ├── data_helpers.py
│ └── param_parser.py
├── LICENSE
├── README.md
└── requirements.txt
jieba
or nltk
).gensim
). metadata.tsv
first).train.py
.test.py
.data_helpers.py
.logging
for helping to record the whole info (including parameters display, model training info, etc.).checkmate.py
, whereas the tf.train.Saver
can only save the last n checkpoints.See data format in /data
folder which including the data sample files. For example:
{"front_testid": "4270954", "behind_testid": "7075962", "front_features": ["invention", "inorganic", "fiber", "based", "calcium", "sulfate", "dihydrate", "calcium"], "behind_features": ["vcsel", "structure", "thermal", "management", "structure", "designed"], "label": 0}
You can use nltk
package if you are going to deal with the English text data.
You can use jieba
package if you are going to deal with the Chinese text data.
This repository can be used in other datasets (text pairs similarity classification) in two ways:
data_helpers.py
.Anyway, it should depend on what your data and task are.
You can download the Word2vec model file (dim=100). Make sure they are unzipped and under the /data
folder.
You can pre-training your word vectors (based on your corpus) in many ways:
gensim
package to pre-train data.glove
tools to pre-train data.🤔Before you open the new issue, please check the data sample file under the data
folder and read the other open issues first, because someone maybe ask the same question already.
See Usage.
References:
References:
References:
Warning: Model can use but not finished yet 🤪!
References:
References:
References:
References:
Warning: Model can use but not finished yet 🤪!
References:
Warning: Only achieve the ABCNN-1 Model🤪!
References:
黄威,Randolph
SCU SE Bachelor; USTC CS Ph.D.
Email: chinawolfman@hotmail.com
My Blog: randolph.pro
LinkedIn: randolph's linkedin