Closed soroush-ziaeinejad closed 2 years ago
@soroush-ziaeinejad
thank you.
Agree. In the next iteration, we have to refactor the pipeline when tagme=true
is selected.
Just for our future reference, if tagme selected, we have to do it in a lazy load way, that is, load the tweets/news that have tagme annotations if exist, otherwise 1) tagme each tweet/news, 2) save them finally in ./data/toy/preprocessed/tweets|news.tagme.csv
please do the following:
NewNews.csv
to News.csv
./data/toy/readme.md
#tweets, avg#tweets/day, #users
../data/toy/readme.md
apl
layer as lazy load, that is, load ./data/toy/news.csv
if exists, otherwise, 1) start crawling the tweet's URLs, 2) save the crawled pages in ./data/toy/news.csv
@hosseinfani,
The stats of the toy news dataset is like this:
Now we need the TagMe annotated data. I can run the TagMe API for the text of news articles, or we can use the words of the titles of the news articles. Titles almost contain important words about the content. Please let me know your comments on this. Thanks.