WDscholia / scholia

Wikidata-based scholarly profiles
https://scholia.toolforge.org
Other
215 stars 78 forks source link

Self citations #1717

Open Adafede opened 2 years ago

Adafede commented 2 years ago

Hi!

I was surprised by a lot of articles having a high self-citation ratio in Scholia... then looked at the SPARQL and saw it takes all co-authors into account.

It might make the query longer but couldn't it be a good idea to separate first, co-authors, last maybe?

Read in https://doi.org/10.1007/s11192-020-03413-9

For author-level tracking, we define a self-citation as any instance where a given author cites their own articles. How we define self-citation differs from recent work done by Ioannidis et al. (2019) where they count a self-citation as any occasion where an author of a given article cites that article. Our reason for this is that we want to know how often specific authors self-cite, not how often an article gets cited by coauthors. In general, we believe that authors’ citations should be sorted by source for clarification: self, nonself, coauthor, etc. and tracked separately. We focus here on self-citation data to show how the approach could work.

fnielsen commented 2 years ago

Good point.

Let me try to understand you. Is the self-citation in Scholia the one displayed with https://github.com/WDscholia/scholia/blob/master/scholia/app/templates/work_citations-per-year.sparql ?

My intuition of a self-citation is that a citation is a self-citation if any author (first, last, middle) matches any author (first, last, middle) in the cited work. So I suppose that should be the default, but we could extend the graph with multiple colors. We would then have nine self-citation cases as far as I see (first citing last, first citing middle, first citing fist, middle citing last, ...)

Adafede commented 2 years ago

Oh, thank you for asking for clarification, it was indeed unclear!

The "self-citation" I mentioned was on the article page directly (https://github.com/WDscholia/scholia/blob/master/scholia/app/templates/work_citation-graph.sparql), not on the author page (https://github.com/WDscholia/scholia/blob/master/scholia/app/templates/work_citations-per-year.sparql), but it could, of course, apply to both!

Perfectly agree with the rest and the nine colors!

fnielsen commented 2 years ago

I am still confused the graph at https://scholia.toolforge.org/work/Q41799194#citation-graph-header shows no indication of self-citations. The coloring is based on how "central" the paper is in the citaiton graph. That is not explained in the graph, - unfortunately.

Adafede commented 2 years ago

It is actually not on the header you mention but the one below. https://scholia.toolforge.org/work/Q41799194#citations-per-year-header 😜

Adafede commented 6 months ago

Here is (after way too long...sorry) a first attempt:

# tool: scholia
#defaultView:BarChart
PREFIX target: <http://www.wikidata.org/entity/Q20895241>

SELECT ?year (COUNT(?category) AS ?count) ?category WHERE {
  {
    SELECT DISTINCT ?citing_work
           (IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 = "1"), "Self citation: First author citing first author",
               IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 != "1" && ?rank2 != ?maxRank2), "Self citation: Middle author citing first author",
                  IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 = ?maxRank2), "Self citation: Last author citing first author",
                     IF((?coauthor = ?citer) && (?rank1 != "1" && ?rank1 != ?maxRank1) && (?rank2 = "1"), "Self citation: First author citing middle author",
                        IF((?coauthor = ?citer) && (?rank1 != "1" && ?rank1 != ?maxRank1) && (?rank2 != "1" && ?rank2 != ?maxRank2), "Self citation: Middle author citing middle author",
                           IF((?coauthor = ?citer) && (?rank1 != "1" && ?rank1 != ?maxRank1) && (?rank2 = ?maxRank2), "Self citation: Last author citing middle author",
                              IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 = "1"), "Self citation: First author citing last author",
                                 IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 != "1" && ?rank2 != ?maxRank2), "Self citation: Middle author citing last author",
                                    IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 = ?maxRank2), "Self citation: Last author citing last author",
                                       IF((?coauthor != ?citer), "External citation", "Other")
                                    )
                                 )
                              )
                           )
                        )
                     )
                  )
               )
           ) AS ?category)
           (STR(YEAR(?date)) AS ?year)
    WHERE {
      ?work wdt:P50 target: .
      ?work p:P50 ?author_triple .
      ?author_triple ps:P50 ?coauthor.
      ?author_triple pq:P1545 ?rank1.
      ?citing_work wdt:P2860 ?work .
      ?citing_work wdt:P577 ?date .
      ?citing_work p:P50 ?citer_triple .
      ?citer_triple ps:P50 ?citer.
      ?citer_triple pq:P1545 ?rank2.
      {
        SELECT ?work (MAX(?rank1) AS ?maxRank1) (MAX(?rank2) AS ?maxRank2) WHERE {
          ?work wdt:P50 target: .
          ?work p:P50 ?author_triple .
          ?author_triple pq:P1545 ?rank1.
          ?citing_work wdt:P2860 ?work .
          ?citing_work p:P50 ?citer_triple .
          ?citer_triple pq:P1545 ?rank2.
        } GROUP BY ?work
      }
    }
  }
}
GROUP BY ?year ?category
ORDER BY DESC(?year)
Adafede commented 4 months ago

@fnielsen I wanted to submit a PR with the above proposal but then, checking it again, found out that it artificially increases the number of citations as one citation can be into multiple categories at the same time (worst case scenario, first, mid last author are the same). I do not have any good solution in mind to avoid it...

It would require to establish a hierarchy like "if last author cites last author, has precedence over mid, etc"...

fnielsen commented 4 months ago

I suppose it is better to have the correct count rather than a detailed coloration.

Adafede commented 4 months ago

This one is closer but still not good (it is sampled randomly):

# tool: scholia
#defaultView:BarChart
PREFIX target: <http://www.wikidata.org/entity/Q20895241>

SELECT ?year (COUNT(?category) AS ?count) ?category WHERE {
  {
    SELECT DISTINCT ?citing_work (SAMPLE(?category_) AS ?category) ?year WHERE {
      {
        SELECT DISTINCT ?citing_work
               (IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 = "1"), "Self citation: First author citing first author",
                  IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 = ?maxRank2), "Self citation: Last author citing first author",
                    IF((?coauthor = ?citer) && (?rank1 = "1") && (?rank2 != "1") && (?rank2 != ?maxRank2), "Self citation: Middle author citing first author",
                       IF((?coauthor = ?citer) && (?rank1 != "1") && (?rank1 != ?maxRank1) && (?rank2 = "1"), "Self citation: First author citing middle author",
                          IF((?coauthor = ?citer) && (?rank1 != "1") && (?rank1 != ?maxRank1) && (?rank2 = ?maxRank2), "Self citation: Last author citing middle author",
                             IF((?coauthor = ?citer) && (?rank1 != "1") && (?rank1 != ?maxRank1) && (?rank2 != "1") && (?rank2 != ?maxRank2), "Self citation: Middle author citing middle author",
                                IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 = "1"), "Self citation: First author citing last author",
                                  IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 = ?maxRank2), "Self citation: Last author citing last author",
                                    IF((?coauthor = ?citer) && (?rank1 = ?maxRank1) && (?rank2 != "1") && (?rank2 != ?maxRank2), "Self citation: Middle author citing last author",
                                      IF((?coauthor != ?citer), "External citation", "Other")
                                      )
                                   )
                                )
                             )
                          )
                       )
                    )
                 )
               ) AS ?category_)
               (STR(YEAR(?date)) AS ?year)
               ?coauthor
               ?citer
               ?rank1
               ?rank2
               ?maxRank1
               ?maxRank2
        WHERE {
          ?work wdt:P50 target: .
          ?work (p:P50|p:P2093) ?author_triple .
          ?author_triple (ps:P50|ps:P2093) ?coauthor.
          ?author_triple pq:P1545 ?rank1.
          ?citing_work wdt:P2860 ?work .
          ?citing_work wdt:P577 ?date .
          ?citing_work (p:P50|p:P2093) ?citer_triple .
          ?citer_triple (ps:P50|ps:P2093) ?citer.
          ?citer_triple pq:P1545 ?rank2.
          {
            SELECT ?work (MAX(?rank1) AS ?maxRank1) (MAX(?rank2) AS ?maxRank2) WHERE {
              ?work wdt:P50 target: .
              ?work p:P50 ?author_triple .
              ?author_triple pq:P1545 ?rank1.
              ?citing_work wdt:P2860 ?work .
              ?citing_work p:P50 ?citer_triple .
              ?citer_triple pq:P1545 ?rank2.
            } GROUP BY ?work
          }
        }
      }
    } GROUP BY ?citing_work ?year
  }
}
GROUP BY ?year ?category
ORDER BY DESC(?year)