dandaman / whealbiCode

Custom source code, workflows and jupyter notebooks used in the analysis of the wheat genotypes sequenced in the Whealbi Project
https://www.whealbi.eu
GNU General Public License v3.0
3 stars 1 forks source link

Tajima’s π #1

Closed dandaman closed 5 years ago

dandaman commented 5 years ago

Hi @WandrilleD ,

is this a mixup for Tajima's D? I cannot seem to find specific variant of π...

Best, @dandaman

dandaman commented 5 years ago

I added a markdown comment under each line in question. Please check and remove them afterwards.

I'm certainly no popgen expert. I only find mention of Tajima's theta but not pi... Tajima was senior author of a paper on pi for AFLP data, but I doubt this code uses that.

WandrilleD commented 5 years ago

@dandaman

I am no expert either, but I rely on what is written in the paper. Thibault has confirmed the usage of Pi as a metric and hasn't raised concerns about the use of the expression "Tajima’s π" in the manuscript ("For the detection of domestication signals, we computed the nucleotide diversity per site (Tajima’s π) over non-overlapping 1 Mb sliding windows for wild diploid") so I'm content with it. I'll nevertheless ask him specifically.

dandaman commented 5 years ago

I'll nevertheless ask him specifically.

Yes, please do. :+1:

dandaman commented 5 years ago

@WandrilleD, I've asked my go-to pop-geneticist Tetyana Nosenko. This is what she answered:

Tajima's pi should mean the same as pi or Nei's pi. This is a historical thing. Some consider Nei and Li, 1979 as the first paper that introduces nucleotide diversity (it was abbreviated as H), others refer to Nej and Tajima, 1981 or Tajima 1983. This software is using a non-standard name for the Watterson's theta too.

So IMO, we should refer to it as simply as nucleotide diversity or pi to avoid confusions.

What do you think?

WandrilleD commented 5 years ago

I've asked to Thibault and he answered me something similar; however he also assured me that any reviewer would understand what Tajima's pi is about.

Personally, as someone removed from this specific field, I don't really like having it just referred as pi, which could mean something else in other fields (ratio between perimeter and diameter of a circle, equilibrium distribution in a Markov sequence evolution model, ... ). So I'd much prefer having it referred to as either Tajima's or Nei's pi , which is less ambiguous. As Tajima's pi is already the expression used in the whole MS, I'd advocate for it over Nei's pi.

Seems good to you?

dandaman commented 5 years ago

I was not advocating to just name it pi. I'm just trying to reduce the chance of miss-interpretation as I had: A specific variant of nucleotide diversity or worse something different than normal π.

How about leaving it as it is now and adding the citations Tetyana mentioned? I'll gladly take over this task.

WandrilleD commented 5 years ago

OK sure.

dandaman commented 5 years ago

Done