arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
309 stars 118 forks source link

Arguments too long for lumpy #215

Open GangLiTarheel opened 6 years ago

GangLiTarheel commented 6 years ago

Hi, I am trying to run traditional Lumpy. I have about 10000 samples.

My code looks like: lumpy-sv/bin/lumpy \ -mw 4 \ -tt 0 \ -pe 10000 times -sr 10000 times

output.vcf

When I run the code, the slurm system said that the arguments are too long for lumpy. Is there any solutions to fix this issue?

Thanks a lot. Best, Gang

ryanlayer commented 6 years ago

Try running your samples in batches of 1000 or 100, use the -P option to propagate the breakpoint probability intervals into your VCF. Once you have your 100 (or 1000) VCFs, use lsort and lmerge from SVTOOLS ( https://github.com/hall-lab/svtools) to merge all of those calls into one call set. Once you have your merged call set, use SVTYPER to genotype those calls across your 10000 samples.

Can you share some of your experimental details? 10000 is by far the largest batch I have seen.

On Thu, Nov 9, 2017 at 8:31 AM, GangLiTarheel notifications@github.com wrote:

Hi, I am trying to run traditional Lumpy. I have about 10000 samples.

My code looks like: /proj/yunligrp/users/gangli/renci/lumpy-sv/bin/lumpy -mw 4 -tt 0 -pe 10000 times -sr 10000 times

output.vcf

When I run the code, the slurm system said that the arguments are too long for lumpy. Is there any solutions to fix this issue?

Thanks a lot. Best, Gang

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/215, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUfRKj2AMqajFtPlplXusb-AfubFHks5s0xregaJpZM4QYJjV .

-- Ryan Layer

GangLiTarheel commented 6 years ago

Sorry for the late reply. It takes a little longer than I thought to get the merge call set.

I divided all the samples to 4 batches. Then I ran lumpy, svtools lsort and lmerge to get the merged call set. However, for the last step, to use SVTYPER to genotype those calls across my 10000 samples, it returns with the error that the argument list too long. Any idea how to solve this issue?

The script looks like: svtyper \ -i lmerge.vcf \ -B sample1.bam,.....,sample10000.bam \ -S sample1.splitters.bam,....,sample10000.splitters.bam > svtyper.vcf

Error: svtyper: Argument list too long

I am using SVTYPER(https://github.com/cc2qe/svtyper). With this package, I did run it successfully to genotype the calls for each batch's lumpy result.

We are doing the whole-genome-sequencing analysis. And I think large population cohorts would be more available for WGS analyses. So it would be of great help to add some instructions in example pipelines for such cases.

Best, Gang

On Wed, Nov 15, 2017 at 11:46 AM, Ryan Layer notifications@github.com wrote:

Try running your samples in batches of 1000 or 100, use the -P option to propagate the breakpoint probability intervals into your VCF. Once you have your 100 (or 1000) VCFs, use lsort and lmerge from SVTOOLS ( https://github.com/hall-lab/svtools) to merge all of those calls into one call set. Once you have your merged call set, use SVTYPER to genotype those calls across your 10000 samples.

Can you share some of your experimental details? 10000 is by far the largest batch I have seen.

On Thu, Nov 9, 2017 at 8:31 AM, GangLiTarheel notifications@github.com wrote:

Hi, I am trying to run traditional Lumpy. I have about 10000 samples.

My code looks like: /proj/yunligrp/users/gangli/renci/lumpy-sv/bin/lumpy -mw 4 -tt 0 -pe 10000 times -sr 10000 times

output.vcf

When I run the code, the slurm system said that the arguments are too long for lumpy. Is there any solutions to fix this issue?

Thanks a lot. Best, Gang

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/215, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAlDUfRKj2AMqajFtPlplXusb-AfubFHks5s0xregaJpZM4QYJjV .

-- Ryan Layer

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/215#issuecomment-344653943, or mute the thread https://github.com/notifications/unsubscribe-auth/Ad9EqYyItUr_r79VzjCdtLiCwU1tkufwks5s2xWDgaJpZM4QYJjV .

ryanlayer commented 6 years ago

Argument list too long is a bash error, not a lumpy or svtyper error.

On Dec 7, 2017, at 8:58 AM, GangLiTarheel notifications@github.com wrote:

Sorry for the late reply. It takes a little longer than I thought to get the merge call set.

I divided all the samples to 4 batches. Then I ran lumpy, svtools lsort and lmerge to get the merged call set. However, for the last step, to use SVTYPER to genotype those calls across my 10000 samples, it returns with the error that the argument list too long. Any idea how to solve this issue?

The script looks like: svtyper \ -i lmerge.vcf \ -B sample1.bam,.....,sample10000.bam \ -S sample1.splitters.bam,....,sample10000.splitters.bam > svtyper.vcf

Run svtyper once per sample. This will make this step, in your case, 10000x faster.

svtyper -i lmerge.vcf -B sample1.bam > sample1.vcf

Error: svtyper: Argument list too long

I am using SVTYPER(https://github.com/cc2qe/svtyper). With this package, I did run it successfully to genotype the calls for each batch's lumpy result.

We are doing the whole-genome-sequencing analysis. And I think large population cohorts would be more available for WGS analyses. So it would be of great help to add some instructions in example pipelines for such cases.

Best, Gang

On Wed, Nov 15, 2017 at 11:46 AM, Ryan Layer notifications@github.com wrote:

Try running your samples in batches of 1000 or 100, use the -P option to propagate the breakpoint probability intervals into your VCF. Once you have your 100 (or 1000) VCFs, use lsort and lmerge from SVTOOLS ( https://github.com/hall-lab/svtools) to merge all of those calls into one call set. Once you have your merged call set, use SVTYPER to genotype those calls across your 10000 samples.

Can you share some of your experimental details? 10000 is by far the largest batch I have seen.

On Thu, Nov 9, 2017 at 8:31 AM, GangLiTarheel notifications@github.com wrote:

Hi, I am trying to run traditional Lumpy. I have about 10000 samples.

My code looks like: /proj/yunligrp/users/gangli/renci/lumpy-sv/bin/lumpy -mw 4 -tt 0 -pe 10000 times -sr 10000 times

output.vcf

When I run the code, the slurm system said that the arguments are too long for lumpy. Is there any solutions to fix this issue?

Thanks a lot. Best, Gang

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/215, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAlDUfRKj2AMqajFtPlplXusb-AfubFHks5s0xregaJpZM4QYJjV .

-- Ryan Layer

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/215#issuecomment-344653943, or mute the thread https://github.com/notifications/unsubscribe-auth/Ad9EqYyItUr_r79VzjCdtLiCwU1tkufwks5s2xWDgaJpZM4QYJjV .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.