Closed Camethyleabergen closed 6 years ago
you can not filter C->T mutations from a VCF file using methylKit
On Tue, Jul 24, 2018 at 3:44 PM Camethyleabergen notifications@github.com wrote:
Hi,
It's definitely not an issue, more a good practice that I'm looking for.
I have a VCF file for some samples, and I would like to use the possibility given by MethylKit to filter the C->T mutations. My issue is about the fact my vcf is multisample (and long story short, some samples in my VCF are not in the methylKit Object).
I tried to use VariantAnnotation Package to convert a vcf file into a GRanges object, but it seems that the multisampling is not taken in account there.
Do you have any good practice about that?
Best,
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/126, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm9EVUXlmCl7B_vXLQoIUgzJ0uTt_2pks5uJyS4gaJpZM4VcxJz .
Hi Altuna,
thanks for your quick answer. I guess I can't directly indeed, and I have to convert it to a GRanges object, my question was actually how to do so and keeping the multisample information. If your point is that you can't filter out the potentially C->T mutations, I'm a bit concerned as it's the purpose of the paragraph "Filtering CpGs" in the tutorial you gave to MethylKit.
Best,
You can filter CpGs based on coverage, other quantitative features and location as shown in the tutorial.
You can’t read VCF files with methylKit, you can read them as GRanges and do whatever filtering GRanges objects allow. Your question seems to have nothing to do with methyKit but a general question on how to filter VCF files
On Wed 25. Jul 2018 at 09:39, Camethyleabergen notifications@github.com wrote:
Hi Altuna,
thanks for your quick answer. I guess I can't directly indeed, and I have to convert it to a GRanges object, my question was actually how to do so and keeping the multisample information. If your point is that you can't filter out the potentially C->T mutations, I'm a bit concerned as it's the purpose of the paragraph "Filtering CpGs" in the tutorial you gave to MethylKit.
Best,
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/126#issuecomment-407663734, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm9EbuUXl2iijG-ACNcYBPRLbluol4kks5uKCDGgaJpZM4VcxJz .
-- Sent from mobile, excuse the brevity
Thanks for your answer. It seems that I totally did not understand that part of your tutorial then :
Now, let’s assume we know the locations of C->T mutations. These locations should be removed from the analysis as they do not represent bisulfite treatment associated conversions. Mutation locations are stored in a GRanges object, and we can use that to remove CpGs overlapping with mutations. In order to do overlap operation, we will convert the methylKit object to a GRanges object and do the filtering with %over% function within [ ]. The returned object will still be a methylKit object.
How can I know the locations of the C->T mutations if they don't come from a VCF file at first?
Now I got your question, you need to extract the locations of C-> T mutations from VCF and use those to filter methylKit objects as shown in tutorial
Check this thread https://support.bioconductor.org/p/94451/
You need to use other packages to do what you want. variantAnnotation package could also help
On Wed 25. Jul 2018 at 09:53, Camethyleabergen notifications@github.com wrote:
Thanks for your answer. It seems that I totally did not understand that part of your tutorial then :
Now, let’s assume we know the locations of C->T mutations. These locations should be removed from the analysis as they do not represent bisulfite treatment associated conversions. Mutation locations are stored in a GRanges object, and we can use that to remove CpGs overlapping with mutations. In order to do overlap operation, we will convert the methylKit object to a GRanges object and do the filtering with %over% function within [ ]. The returned object will still be a methylKit object.
How can I know the locations of the C->T mutations if they don't come from a VCF file at first?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/126#issuecomment-407667138, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm9EQ5FUXP-yKi7w6OFAbBXGaUL-t82ks5uKCQAgaJpZM4VcxJz .
-- Sent from mobile, excuse the brevity
OK :) All fine thanks !
Yet, I permit myself a question, how can I apply to a unite MethylKit object with different samples that correction from the generated GRanges mutation position list? Do those information have to come in the extra column information of my GRanges object? Or should it be done before the unite step ?
Shall I use a different vcf per sample, shall I use a specific format?
thanks ++
I would apply unite first and then drop every position that has a C-T mutation in any of the samples
On Wed 25. Jul 2018 at 13:34, Camethyleabergen notifications@github.com wrote:
Yet, I permit myself a question, how can I apply to a unite MethylKit object with different samples that correction from the generated GRanges mutation position list? Do those information have to come in the extra column information of my GRanges object? Shall I use different vcf per sample, shall I use a specific format?
thanks ++
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/126#issuecomment-407723812, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm9ESBkD_w8ITb7w5PYbhZb1FsPLffbks5uKFepgaJpZM4VcxJz .
-- Sent from mobile, excuse the brevity
Thanks Altuna.
I have two cases in mind where it seems tricky for me :
I'll take the case of SNV and not SNP to explain the following :
in the mtdna (I'm currently having a project on it, however I know the material in itself is tricky). I have around 20 samples for now, but it will grow soon. So far when filtering the VCF to have only the C-T mutations, I have about 280 C-positions (both strands) which have a SNV there. Is it really realistic to not analyse the 19 other samples if only one have a SNV on a C->T position? I would -I guess but I'm open to discussion and debate ;) - probably think that the analysis of 9 versus 10 samples (if I have 2 groups) is still relevant?
in nuclear DNA, I have not so many experience so far, but I'm thinking ahead a bit. Let's say we have 50 samples in 2 groups. if a SNV or SNP has a frequency of 0.01 or more, I would have about one chance out of two to discard a position. How can I be confident in the analysis ?
Would it be relevant to treat and filter methylkit objects before being united ?
Best,
With the default settings, of a CpG is not covered in all samples you will not see that CpG in methylBase object. You can change that behavior with min.per.group or sth like that argument in unite, then it might make sense to filter before unite
On Thu 26. Jul 2018 at 15:24, Camethyleabergen notifications@github.com wrote:
Thanks Altuna.
I have two cases in mind where it seems tricky for me :
I'll take the case of SNV and not SNP to explain the following :
-
in the mtdna (I'm currently having a project on it, however I know the material in itself is tricky). I have around 20 samples for now, but it will grow soon. So far when filtering the VCF to have only the C-T mutations, I have about 280 C-positions (both strands) which have a SNV there. Is it really realistic to not analyse the 19 other samples if only one have a SNV on a C->T position? I would -I guess but I'm open to discussion and debate ;) - probably think that the analysis of 9 versus 10 samples (if I have 2 groups) is still relevant?
in nuclear DNA, I have not so many experience so far, but I'm thinking ahead a bit. Let's say we have 50 samples in 2 groups. if a SNV or SNP has a frequency of 0.01 or more, I would have about one chance out of two to discard a position. How can I be confident in the analysis ?
Would it be relevant to treat and filter methylkit objects before being united ?
Best,
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/126#issuecomment-408096204, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm9EYnH05glCHpWea24IqFA3g1k6fZQks5uKcMVgaJpZM4VcxJz .
-- Sent from mobile, excuse the brevity
Thanks Altuna :)
Hi,
It's definitely not an issue, more a good practice that I'm looking for.
I have a VCF file for some samples, and I would like to use the possibility given by MethylKit to filter the C->T mutations. My issue is about the fact my vcf is multisample (and long story short, some samples in my VCF are not in the methylKit Object).
I tried to use VariantAnnotation Package to convert a vcf file into a GRanges object, but it seems that the multisampling is not taken in account there.
Do you have any good practice about that?
Best,