Closed bekou closed 4 years ago
The ACL papers you linked are NAACL 2019 papers which have only been online for less than a month, and it looks like Google Scholar hasn't even indexed them yet. The only solution there is to wait, AFAIK.
It had indeed not been crawled. I asked Google to crawl it via the search console. This should probably be part of the ingestion checklist...
@mbollmann @mjpost Thanks for your prompt replies. However, the 5th document although it has been published on NAACL, it is on arxiv from March. Are you aware of any known issues where the ACL style files are not being parsed correctly from scholar?
Another example might be the 1st document which as far as I can see, exists on scholar and 1 and 2 receive the citation properly while my document from emnlp 2018 does not. Are you aware of this type of issues?
Best, Giannis
I don't see how ACL style files could possibly factor into this. The only relevant factor in this case should be the output they produce, and I don't see how our references format is that special or different from others that it could cause issues.
I totally agree with that. To me the documents look totally fine, but since the issue exists (I can find several examples for that), I was just wondering whether this is a known ACL-style issue or it is a scholar issue.
@mbollmann @mjpost As of May 2020, the issue doesn't seem to be resolved.
This does look fishy...
This does look fishy...
But, unfortunately, seems to be a problem with the way that google scholar indexes papers and links citations, not with the anthology. Unfortunately because otherwise there could be a way for us to fix it.
I've just posted this on Twitter, and Fernando Pereira reported this to GS team. Hopefully they can do something about it.
Here's the thread, just in case there are any updates or other people report the same issue: https://twitter.com/annargrs/status/1262050827600084993?s=20
Thanks for drawing this to people's attention, @annargrs. Maybe Fernando's attention can help fix this.
I wonder if an SEO effort might be helpful, for example, lots of academics adding deep Anthology links from their web pages. In general, though, I think it's going to be hard to outrank the arXiv.
FYI, looking up @annargrs paper in the Google Search Console, it reports that it is not in the index:
That appears to be because we declare the version without the slash as canonical. Looking at that page, I see that it's not in the siteindex:
So maybe the issue is partly due to us being inconsistent in what we call the canonical page.
Hi, coming from the old twitter thread. This year there is something odd with how NLP publications are collecting citations on Google Scholar, not just ACL. I am using this issue for documenting this, even if it is not just ACL-related. It used to be that GS had more citations for me on average, although some papers were under-counted. Now Semantic Scholar is running away with citations counts, quickly. They do seem to be proper citations. I am easily disambiguated because my name is likely unique.
A few examples from my author pages (GS: https://scholar.google.com/citations?user=Uh_GH14AAAAJ&hl=en) and (SS: https://www.semanticscholar.org/author/Marcin-Junczys-Dowmunt/1733933?sort=total-citations). Numbers are GS vs SS, I list the ones with the largest gaps and mostly ACL, but all of them are now under-counted on GS when comparing to SS.
We’re getting some progress on Twitter thanks to @annargrs’ tweet and a response from Fernando. Anyone have any idea what this could mean, though?
https://mobile.twitter.com/earnmyturns/status/1271139856266096643
PDF documents?
PDF documents?
Yeah I assume to track citations they have to parse PDF bibliographies. Of course this varies depending on the BibTeX as well as the somewhat-venue-specific stylesheet.
Oh, of course. Hmm, I didn't realize that had changed in recent years, but I also assumed they would have had a more robust parser for it. I bet it's a huge headache.
Looks like this is fixed: my citations jumped noticeably and others noticed the same: https://twitter.com/sebgehr/status/1274304855797125120
We can probably (edit: close) this issue.
Whoa, 30% jump. Nice.
Looks like this is fixed: my citations jumped noticeably and others noticed the same: https://twitter.com/sebgehr/status/1274304855797125120
We can probably (edit: close) this issue.
Came here to say the same thing. Cool!
I'm kinda curious now. Do we know what happened somewhere mid 2018 where I think this started?
I really don't know. Maybe something to do with the hyperref package? But the thing, is the variance within individual citation styles seems to me to be greater than that over years. Not everyone uses the official styles, or gets their BibTeX from the same place, etc. I paged through a few examples from ACL 2017 vs. ACL 2019, and didn't really notice any patterns.
Hi,
I am new to the NLP community and thus I don't know if this is a known issue. I think that Google Scholar misses a few citations to my work. I don't know if the issue is that they don't parse the pdf documents correctly but the issue is there. For instance, check this document here that includes only 4 citations. Here you can find some documents (i.e., most of them using the ACL style file) that are not included as citations:
1) https://www.aclweb.org/anthology/papers/N/N19/N19-5001/ 2) https://arxiv.org/pdf/1905.07458v1.pdf 3) https://arxiv.org/abs/1906.07544 4) https://arxiv.org/abs/1905.05044 5) https://www.aclweb.org/anthology/papers/N/N19/N19-1081/
Specifically, from the last document (i.e., 5th which I am also a co-author), none of the citations to my work are added. Is this normal?
I have also this issue to other documents but most of the missed citations are from ACL style documents.
Best, Giannis