alachins / raisd

RAiSD: software to detect positive selection based on multiple signatures of a selective sweep and SNP vectors
33 stars 13 forks source link

Missing regions in population. #26

Closed caonetto closed 3 years ago

caonetto commented 3 years ago

Hi, I have a question in relation to missing data. According to the manual, the default method to deal with missing data is to discard SNPs with missing data. What about regions in the genome that have no mapping and therefore no SNPs? I have noticed that this regions are showing really high scores. Do we need to manually mask regions with no mapping to the reference before running raisd?

Thanks for your help and for creating this tool!

alachins commented 3 years ago

Hi, Indeed such regions will erroneously give high scores. No, you can have RAiSD exclude them by providing a file with the positions (start/stop) of these regions using the -X flag. The content of the file to remove regions [1-10] and [50-100] from chromosome X, for example, is like this (tab delimited): " X 1 10 X 50 100 "

Best regards, Nikos A.

On Thu, Jun 17, 2021 at 11:27 AM caonetto @.***> wrote:

Hi, I have a question in relation to missing data. According to the manual, the default method to deal with missing data is to discard SNPs with missing data. What about regions in the genome that have no mapping and therefore no SNPs? I have noticed that this regions are showing really high scores. Do we need to manually mask regions with no mapping to the reference before running raisd?

Thanks for your help and for creating this tool!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/26, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCXJSI42YUOAVLXGZF3TTG5Y3ANCNFSM463IJTWQ .

-- Nikolaos Alachiotis

caonetto commented 3 years ago

Thanks for clarifying!