acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
432 stars 292 forks source link

custom search not returning EMNLP 2021 papers #1675

Closed mjpost closed 2 years ago

mjpost commented 2 years ago

As reported by @annargrs, using the Google Custom Search Engine box from the Anthology site does not turn up papers from EMNLP 2021. e.g.,

BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology turns up no results:

image

Though it exists.

mbollmann commented 2 years ago

The Anthology version is not in main Google either for me: https://www.google.com/search?q=BERT+Has+Uncommon+Sense%3A+Similarity+Ranking+for+Word+Sense+BERTology

It is correctly being shown in our sitemap, though:

$ curl -s https://aclanthology.org/sitemap_1.xml.gz | gunzip - | grep "2021.blackboxnlp-1.43"                                                                                                      
    <loc>https://aclanthology.org/2021.blackboxnlp-1.43/</loc>

I can only conclude that Google simply hasn't re-indexed us since this was added.

mjpost commented 2 years ago

The google search console lists it as “submitted, not in sitemap”, with a last sitemap crawl date of November 11. That’s for the sitemap index. I added all the individual sitemaps to see if that helps. I think it reread the index but not the individual pieces (before #1658, there were only 3 pieces, now there are 5).

mjpost commented 2 years ago

I think we need an SEO volunteer…

mbollmann commented 2 years ago

Well, it's in the very first piece, and it also seems to be in https://aclanthology.org/sitemap.xml, so I don't know what would be going on here.

But yeah, just one of several reasons I'd like to get rid of GCSE for our internal search engine ASAP. Being correctly (and timely) indexed in Google proper will still remain important, of course ...

mjpost commented 2 years ago

Note that it is in Google (add site:aclanthology.org to the query), just not really being surfaced.

image

But I agree with the goal.