LohseLab / gIMble

A genome-wide IM blockwise likelihood estimation toolkit
GNU General Public License v3.0
14 stars 4 forks source link

choosing the region of the genome for analysis #128

Closed XieHongX closed 7 months ago

XieHongX commented 7 months ago

Hi Dominik,

I thought of sending a personal email, but other people might also be interested so post it here.

I read the related paper of gimble which I enjoyed. I would definitely want to implement this method on my own data. Just a quick question before the analysis. In your paper you only used the intergenic regions when analysing butterfly data. Is it a general recommendation to use only intergenic regions when detecting barriers to gene flow? What possible issue is there for using all sequences across the genome?

Thank you in advance

Best wishes, Hongxin

DRL commented 7 months ago

Hi Hongxin,

we analysed intergenic regions because these tend to be the ones that are least affected by selection (i.e. "neutral sites").

We also removed all regions that might contain repetetive elements since reads that map to those might not actually belong there and might distort SNP patterns.

In the Heliconius data for the paper we had sufficient data in intergenic regions to do the analyses. I don't know what you are working on, but make sure this is the case for you as well. If you have samples that are too diverged from the reference, it could be that there are simply not enough reads covering intergenic regions for gimble to make blocks. Then you would need to look at 3rd codon positions in coding regions (which is a bit fiddly).

all the best,

dom

XieHongX commented 7 months ago

Thanks for the explanation! I think in our data have enough SNP density. One further question, for excluding gene regions, did you exclude the entire gene region (including CDS, intron, and non-coding transcript region) or only the CDS sequence? And is an extension outside the gene region included (for example 1000 bp upstream of the gene region)? I think my confusion is, is this step aiming to only remove possible functional loci under selection or also removing the surrounding hitchhiking region