amyko / clapper

Composite Likelihood Approach to Pedigree Reconstruction
GNU General Public License v3.0
5 stars 0 forks source link

unique markers #2

Open DrLaurenCWhite opened 5 years ago

DrLaurenCWhite commented 5 years ago

Hi, I've been trying to run CLAPPER, but I get an error that I cannot figure out how to solve.

"Thre are more than one markers in the same position in filename.tped! Every marker must have a unique position."

I have double checked that all positions are unique. However, I don't have genetic distances in my tped file. This values of this column are all set to 0. (in my options file #conditiononld is also set to 0). Could this be the problem? Can you suggest a way around it?

Cheers

amyko commented 5 years ago

Hi,

The problem comes from having the genetic distance as all zero. I suggest using the average recombination rate to estimate the genetic distances and using that in the genetic distance column in the input file. Hope this helps!

Best, Amy

On Wed, Apr 17, 2019 at 1:42 AM Dr Lauren C White notifications@github.com wrote:

Hi, I've been trying to run CLAPPER, but I get an error that I cannot figure out how to solve.

"Thre are more than one markers in the same position in filename.tped! Every marker must have a unique position."

I have double checked that all positions are unique. However, I don't have genetic distances in my tped file. This values of this column are all set to 0. (in my options file #conditiononld is also set to 0). Could this be the problem? Can you suggest a way around it?

Cheers

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/amyko/clapper/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/AFboJ78w2r_D5f9_JfH5Re5d0Td_UlAHks5vht5cgaJpZM4c0ghw .

DrLaurenCWhite commented 5 years ago

Thanks for the quick reply! And yes, that's really helpful. Cheers

agilly commented 5 years ago

Hi there, I'm having the same issue. I am not familiar with cM distances and recombination. My PLINK files don't have recombination rates, so I have to add them. I fetched them from : http://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/plink.GRCh38.map.zip

and I converted them to a format PLINK likes, using the recombination rate average of 1.2:

for i in {1..22}; do awk '{print $4, "1.2", $3*100}' plink.chr$i.GRCh38.map | sponge plink.chr$i.GRCh38.map; done

This gives a file like this per chromosome (position, recomb_rate, Morgan distance):

55550 1.2 0
82571 1.2 8.0572
88169 1.2 9.2229
285245 1.2 43.9456
629218 1.2 147.815
629241 1.2 147.821
630053 1.2 148.056
632942 1.2 148.889
633147 1.2 148.948
785910 1.2 193.179

I then annotate my file with plink --cm-map plink.chr@.GRCh38.map

I still get the same error. What am I doing wrong?

Thanks,

A

agilly commented 5 years ago

So, it turns out that if you have sequencing data, you might have consecutive variants that are really close, and therefore they might have the same coordinates in Morgan. I went around this by removing variants that have the same coordinates. Another way would be to add a random small noise to the duplicates using R.

I also ran into another error after that saying that my coordinates were decreasing. It turns out the chromosome map resets after every chromosome end, whereas clapper expects monotonically increasing positions throughout the genome. This is solved by adding the previous position at the beginning of each chromosome.

I wonder if it would be possible to use chromosome and position instead of cM distance in a future release. Files annotated with cM distance are now becoming rarer and rarer, and recombination maps are hard to find for new builds.