Closed andrewSharo closed 6 years ago
Human?
On Jul 10, 2018, at 7:03 PM, andrewSharo notifications@github.com wrote:
Hi Ryan, I'm getting a very large number of SVs in my vcf (~10^6) after running lumpyexpress on a single sample. SVTyper runs very slowly, so I'm looking to filter my vcf prior to running SVTyper. Do you recommend filtering by SU? Filtering for SU > 20 gives ~20,000 SVs which is more manageable. Do you think I may be losing a lot of real calls by doing this? Should I focus instead on calls that have both split read and paired-end read support?
As a side note, do you think running lumpy jointly on multiple samples will help reduce the number of SVs per sample? I have about 100 samples, but have been running them individually. Best, Andrew
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Yes, human. I should have mentioned in the first post.
Hg38?
On Jul 10, 2018, at 7:10 PM, andrewSharo notifications@github.com wrote:
Yes, human. I should have mentioned in the first post.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Yes, hg38.
hg38 has _alot of contigs.
Try using this exclude file:
http://layerlabweb.s3.amazonaws.com/lumpy/hg38_lcr_rand.bed.gz
On Tue, Jul 10, 2018 at 8:36 PM andrewSharo notifications@github.com wrote:
Yes, hg38.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/253#issuecomment-404025963, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUdRIS_dIbdKv6lvZ5sbS_Ff2Fodoks5uFWTLgaJpZM4VKVcX .
I just updated this file, so you may need to redownload.
On Tue, Jul 10, 2018 at 8:43 PM Ryan Layer ryan.layer@gmail.com wrote:
hg38 has _alot of contigs.
Try using this exclude file:
http://layerlabweb.s3.amazonaws.com/lumpy/hg38_lcr_rand.bed.gz
On Tue, Jul 10, 2018 at 8:36 PM andrewSharo notifications@github.com wrote:
Yes, hg38.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/253#issuecomment-404025963, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUdRIS_dIbdKv6lvZ5sbS_Ff2Fodoks5uFWTLgaJpZM4VKVcX .
Thanks, just downloaded. Will run and report back tomorrow.
Hi Ryan, Just ran with exclude file. I still ended up with 626,907 SVs, which is a lot less than 1M but still too much to all give to SVTyper. Let me know if you have any other ideas to decrease the number of SVs found by Lumpy. As I said in my first post, I'm considering filtering by SU or SU and SR since SR is potentially more reliable.
How many are BNDs?
On Thu, Jul 12, 2018 at 11:19 AM andrewSharo notifications@github.com wrote:
Hi Ryan, Just ran with exclude file. I still ended up with 626,907 SVs, which is a lot less than 1M but still too much to all give to SVTyper. Let me know if you have any other ideas to decrease the number of SVs found by Lumpy. As I said in my first post, I'm considering filtering by SU or SU and SR since SR is potentially more reliable.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/253#issuecomment-404585987, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUV43JrYt4WcBqIvDMCUhvJ4TXT_-ks5uF4UigaJpZM4VKVcX .
Here's the approximate breakdown: 196,000 duplications 240,000 inversions 33,000 deletions 158,000 break ends
What is your read depth?
On Thu, Jul 12, 2018 at 1:16 PM andrewSharo notifications@github.com wrote:
Here's the approximate breakdown: 196,000 duplications 240,000 inversions 33,000 deletions 158,000 break ends
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/253#issuecomment-404620704, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUS4hQXKwAOq-XjEQNmQyM4Jex3TBks5uF6CagaJpZM4VKVcX .
Sorry for the slow response. Depending on how it's calculated, read depth is 48 or 51.
At that depth, I would require at least 7 reads. 10 is probably even better.
On Fri, Jul 13, 2018 at 1:58 PM andrewSharo notifications@github.com wrote:
Sorry for the slow response. Depending on how it's calculated, read depth is 48 or 51.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/253#issuecomment-404937868, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUaXRd6cMnjVUzJNwXNK9xzt5tn5kks5uGPvOgaJpZM4VKVcX .
Great to know, and thanks for your help. So SU > 10. Should extra weight be given to calls with split read support?
First off, many good calls will not have split read support and false positives can have support for both. But calls with multiple types of evidence are more convincing. The most convincing to me are those that also have coverage changes.
We are also starting to visualize most of our calls with samplot and SV-plaudit.
https://github.com/ryanlayer/samplot
https://github.com/jbelyeu/SV-plaudit
On Jul 13, 2018, at 4:07 PM, andrewSharo notifications@github.com wrote:
Great to know, and thanks for your help. So SU > 10. Should extra weight be given to calls with split read support?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Can I get coverage changes directly from the lumpy output vcf or would I need to look at the bams directly with the tools you recommend? I see a BD INFO tag ("amount of bed evidence supporting the variant across all samples") but it seems none of the vcf entries actually have this tag.
SVTYPER can report the depth. But you will need to run it first.
The BD tag is for when you include a bed file of read depth call from something like CNVNator.
On Mon, Jul 16, 2018 at 2:49 PM andrewSharo notifications@github.com wrote:
Can I get coverage changes directly from the lumpy output vcf or would I need to look at the bams directly with the tools you recommend? I see a BD INFO tag ("amount of bed evidence supporting the variant across all samples") but it seems none of the vcf entries actually have this tag.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/253#issuecomment-405377499, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUaPQ49GvhgAyAc9hbejRsVc1g91jks5uHPxogaJpZM4VKVcX .
Great, thank you.
Hello, I am currently filtering the result files obtained from lumpy and svtyper. My raw sequencing data is 10x, and I first limited the length (50bp -1mb) and sv types (inv, del, dup). As an example of a sample file, I obtained 21659 lines of data initially. Afterwards, I plan to use support read for filtering. What is the appropriate threshold for SU that I set? However, I am still planning to filter based on QUAL in the end. Do you have any suggestions for setting these values? Looking forward to your reply! thank. @ryanlayer
Hi Ryan, I'm getting a very large number of SVs in my vcf (~10^6) after running lumpyexpress on a single sample. SVTyper runs very slowly, so I'm looking to filter my vcf prior to running SVTyper. Do you recommend filtering by SU? Filtering for SU > 20 gives ~20,000 SVs which is more manageable. Do you think I may be losing a lot of real calls by doing this? Should I focus instead on calls that have both split read and paired-end read support?
As a side note, do you think running lumpy jointly on multiple samples will help reduce the number of SVs per sample? I have about 100 samples, but have been running them individually. Best, Andrew