Open Adafede opened 2 years ago
Good point.
Let me try to understand you. Is the self-citation in Scholia the one displayed with https://github.com/WDscholia/scholia/blob/master/scholia/app/templates/work_citations-per-year.sparql ?
My intuition of a self-citation is that a citation is a self-citation if any author (first, last, middle) matches any author (first, last, middle) in the cited work. So I suppose that should be the default, but we could extend the graph with multiple colors. We would then have nine self-citation cases as far as I see (first citing last, first citing middle, first citing fist, middle citing last, ...)
Oh, thank you for asking for clarification, it was indeed unclear!
The "self-citation" I mentioned was on the article page directly (https://github.com/WDscholia/scholia/blob/master/scholia/app/templates/work_citation-graph.sparql), not on the author page (https://github.com/WDscholia/scholia/blob/master/scholia/app/templates/work_citations-per-year.sparql), but it could, of course, apply to both!
Perfectly agree with the rest and the nine colors!
I am still confused the graph at https://scholia.toolforge.org/work/Q41799194#citation-graph-header shows no indication of self-citations. The coloring is based on how "central" the paper is in the citaiton graph. That is not explained in the graph, - unfortunately.
It is actually not on the header you mention but the one below. https://scholia.toolforge.org/work/Q41799194#citations-per-year-header 😜
Here is (after way too long...sorry) a first attempt:
# tool: scholia
#defaultView:BarChart
PREFIX target: <http://www.wikidata.org/entity/Q20895241>
SELECT ?year (COUNT(?category) AS ?count) ?category WHERE {
{
SELECT DISTINCT ?citing_work
(IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 = "1"), "Self citation: First author citing first author",
IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 != "1" && ?rank2 != ?maxRank2), "Self citation: Middle author citing first author",
IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 = ?maxRank2), "Self citation: Last author citing first author",
IF((?coauthor = ?citer) && (?rank1 != "1" && ?rank1 != ?maxRank1) && (?rank2 = "1"), "Self citation: First author citing middle author",
IF((?coauthor = ?citer) && (?rank1 != "1" && ?rank1 != ?maxRank1) && (?rank2 != "1" && ?rank2 != ?maxRank2), "Self citation: Middle author citing middle author",
IF((?coauthor = ?citer) && (?rank1 != "1" && ?rank1 != ?maxRank1) && (?rank2 = ?maxRank2), "Self citation: Last author citing middle author",
IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 = "1"), "Self citation: First author citing last author",
IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 != "1" && ?rank2 != ?maxRank2), "Self citation: Middle author citing last author",
IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 = ?maxRank2), "Self citation: Last author citing last author",
IF((?coauthor != ?citer), "External citation", "Other")
)
)
)
)
)
)
)
)
) AS ?category)
(STR(YEAR(?date)) AS ?year)
WHERE {
?work wdt:P50 target: .
?work p:P50 ?author_triple .
?author_triple ps:P50 ?coauthor.
?author_triple pq:P1545 ?rank1.
?citing_work wdt:P2860 ?work .
?citing_work wdt:P577 ?date .
?citing_work p:P50 ?citer_triple .
?citer_triple ps:P50 ?citer.
?citer_triple pq:P1545 ?rank2.
{
SELECT ?work (MAX(?rank1) AS ?maxRank1) (MAX(?rank2) AS ?maxRank2) WHERE {
?work wdt:P50 target: .
?work p:P50 ?author_triple .
?author_triple pq:P1545 ?rank1.
?citing_work wdt:P2860 ?work .
?citing_work p:P50 ?citer_triple .
?citer_triple pq:P1545 ?rank2.
} GROUP BY ?work
}
}
}
}
GROUP BY ?year ?category
ORDER BY DESC(?year)
@fnielsen I wanted to submit a PR with the above proposal but then, checking it again, found out that it artificially increases the number of citations as one citation can be into multiple categories at the same time (worst case scenario, first, mid last author are the same). I do not have any good solution in mind to avoid it...
It would require to establish a hierarchy like "if last author cites last author, has precedence over mid, etc"...
I suppose it is better to have the correct count rather than a detailed coloration.
This one is closer but still not good (it is sampled randomly):
# tool: scholia
#defaultView:BarChart
PREFIX target: <http://www.wikidata.org/entity/Q20895241>
SELECT ?year (COUNT(?category) AS ?count) ?category WHERE {
{
SELECT DISTINCT ?citing_work (SAMPLE(?category_) AS ?category) ?year WHERE {
{
SELECT DISTINCT ?citing_work
(IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 = "1"), "Self citation: First author citing first author",
IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 = ?maxRank2), "Self citation: Last author citing first author",
IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 != "1") && (?rank2 != ?maxRank2), "Self citation: Middle author citing first author",
IF((?coauthor = ?citer) && (?rank1 != "1") && (?rank1 != ?maxRank1) && (?rank2 = "1"), "Self citation: First author citing middle author",
IF((?coauthor = ?citer) && (?rank1 != "1") && (?rank1 != ?maxRank1) && (?rank2 = ?maxRank2), "Self citation: Last author citing middle author",
IF((?coauthor = ?citer) && (?rank1 != "1") && (?rank1 != ?maxRank1) && (?rank2 != "1") && (?rank2 != ?maxRank2), "Self citation: Middle author citing middle author",
IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 = "1"), "Self citation: First author citing last author",
IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 = ?maxRank2), "Self citation: Last author citing last author",
IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 != "1") && (?rank2 != ?maxRank2), "Self citation: Middle author citing last author",
IF((?coauthor != ?citer), "External citation", "Other")
)
)
)
)
)
)
)
)
) AS ?category_)
(STR(YEAR(?date)) AS ?year)
?coauthor
?citer
?rank1
?rank2
?maxRank1
?maxRank2
WHERE {
?work wdt:P50 target: .
?work (p:P50|p:P2093) ?author_triple .
?author_triple (ps:P50|ps:P2093) ?coauthor.
?author_triple pq:P1545 ?rank1.
?citing_work wdt:P2860 ?work .
?citing_work wdt:P577 ?date .
?citing_work (p:P50|p:P2093) ?citer_triple .
?citer_triple (ps:P50|ps:P2093) ?citer.
?citer_triple pq:P1545 ?rank2.
{
SELECT ?work (MAX(?rank1) AS ?maxRank1) (MAX(?rank2) AS ?maxRank2) WHERE {
?work wdt:P50 target: .
?work p:P50 ?author_triple .
?author_triple pq:P1545 ?rank1.
?citing_work wdt:P2860 ?work .
?citing_work p:P50 ?citer_triple .
?citer_triple pq:P1545 ?rank2.
} GROUP BY ?work
}
}
}
} GROUP BY ?citing_work ?year
}
}
GROUP BY ?year ?category
ORDER BY DESC(?year)
Hi!
I was surprised by a lot of articles having a high self-citation ratio in Scholia... then looked at the SPARQL and saw it takes all co-authors into account.
It might make the query longer but couldn't it be a good idea to separate first, co-authors, last maybe?
Read in https://doi.org/10.1007/s11192-020-03413-9