Writing thesis - Githubissues

Alhajras commented 10 months ago

Introduction 1 1.1. Task Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3. Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4. Chapter Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Related Work 14 3.1. Entity Disambiguation with Coherence Graphs . . . . . . . . . . . . 15 3.2. Neural Entity Disambiguation . . . . . . . . . . . . . . . . . . . . . . 16 3.3. Entity Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.1. An End-to-End Model for Entity Linking . . . . . . . . . . . 18 3.3.2. The Potential of a Local Model for Entity Linking . . . . . . 19 3.3.3. A Simple Approach to End-to-End Entity Linking with BERT 19 3.3.4. Entity Linking with Entity Embeddings for BERT . . . . . . 20 3.3.5. End-to-End Entity Linking with a Joint Task . . . . . . . . . 21

Background 6 2.1. Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1. Artificial Neural Network Models . . . . . . . . . . . . . . . . 6 2.1.2. Training Deep Neural Networks . . . . . . . . . . . . . . . . . 8 2.2. BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Implementation 22 4.1. Joint Mention Detection and Entity Disambiguation . . . . . . . . . 22 4.1.1. Known Differences . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.2. Output Heads . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.3. Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.4. General Training Procedure . . . . . . . . . . . . . . . . . . . 28 4.1.5. Document Pre-processing . . . . . . . . . . . . . . . . . . . . 28 4.2. Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Evaluation 32 5.1. Evaluation Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2. Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2.1. AIDA-CoNLL . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2.2. Wikipedia Articles Dataset . . . . . . . . . . . . . . . . . . . 34 5.3. Module Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3.1. Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3.2. Candidate Generation . . . . . . . . . . . . . . . . . . . . . . 37 5.4. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.4.1. Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . 38 5.4.2. Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4.3. Pretraining on Wikipedia Articles . . . . . . . . . . . . . . . . 42
Results 44 6.1. Final Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.1.1. Without Candidate Generation . . . . . . . . . . . . . . . . . 44 6.1.2. With Candidate Generation . . . . . . . . . . . . . . . . . . . 46 6.2. Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.2.1. Evaluation by Seen and Unseen Entities . . . . . . . . . . . . 46 6.2.2. Performance by Entity Types . . . . . . . . . . . . . . . . . . 50
Conclusion 57

8 Future Work

Acknowledgments 59

Appendices 60

A. The Gradient of the Loss Function 61

Bibliography 63

List of Figures List of Tables List of Algorithms

Alhajras commented 10 months ago

1 - Introduction

Task Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Configurable generic search engine
- Distributed crawler
- Indexer
- UI Design
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Data is the new currency
- Applications and use cases
- Challenges
Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
- Free configurable search engine
- Generic and can be used on any website
- Easy to be extended and scaled
Chapter Overview . . . . . . . . . . . . . . . . . . . . . . . . . .
- An overview of each chapter and the workflow of the thesis layout

Alhajras commented 10 months ago

2 - Related Work

Talk about the history of web crawling
Talk about Google and its old research paper
Talk about the existing crawling approaches
Talk about indexing approaches
Talk about Parsehub as a free software

Alhajras commented 10 months ago

3 - Theoretical Background

Explain how crawlers work and their infrastructure
Requirement of a good crawler
Talk about the indexing and how it works
Talk about how the UI design will make it easier to crawl

Alhajras commented 10 months ago

4 - Approach

Talk about the overall software architecture
Talk about crawling and include the UI that helps achieve this
Tak about indexing and include the UI that helps achieve this

Alhajras / webscraper

Writing thesis #17