genome / pindel

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
GNU General Public License v3.0
162 stars 89 forks source link

a bug of Pindel : -p does not exist #22

Closed WinterLi1993 closed 8 years ago

WinterLi1993 commented 8 years ago

Hello, when I run "../pindel2vcf -r hs_ref_chr20.fa -R HUMAN_G1K_V2 -d 20100101 -P colontumor_E -e 5" on demo folder ,I met this bug:"The pindel file (-p) does not exist".

so what happened ? Pindel version 0.2.5b6, 20150915.

WinterLi1993 commented 8 years ago

@liangkaiye

liangkaiye commented 8 years ago

-p shall be lower case.

Sent from my iPhone

On Oct 14, 2015, at 21:43, DoubleWinter notifications@github.com wrote:

Hello, when I run "../pindel2vcf -r hs_ref_chr20.fa -R HUMAN_G1K_V2 -d 20100101 -P colontumor_E -e 5" on demo folder ,I met this bug:"The pindel file (-p) does not exist".

so what happened ? Pindel version 0.2.5b6, 20150915.

— Reply to this email directly or view it on GitHub.

WinterLi1993 commented 8 years ago

I have tried lower case and upper case...both aslo show me the same error .....

WinterLi1993 commented 8 years ago

I just run demo command from this software....

liangkaiye commented 8 years ago

../pindel2vcf -r hs_ref_chr20.fa -R HUMAN_G1K_V2 -d 20100101 -p colontumor_E -v colontumor_E.vcf

Try this one. You need to specify output file name. No -E

Sent from my iPhone

On Oct 14, 2015, at 21:43, DoubleWinter notifications@github.com wrote:

../pindel2vcf -r hs_ref_chr20.fa -R HUMAN_G1K_V2 -d 20100101 -P colontumor_E -e 5" on demo folder ,I met this bug:"The pindel file

WinterLi1993 commented 8 years ago

Hi ,I just follow Pindel instructions ,,, I feel something wrong in this script. Can you tell me how to use your software to call somatic Indels ? I have not find any details or pipiline to tell how write command step by step to get somatic indels . Can you give me some detail informations ?

EWLameijer commented 8 years ago

Kai, Winter,

In the first mail, Winter was talking about pindel2vcf, not Pindel itself.

And for pindel2vcf, -p and -P have different meanings.

-p (lowercase p) is for an individual file, like colo_test_D. The uppercase -P can take a whole bunch of files; using "-P colo_test" processes the files colo_test_D, colo_test_SI, colo_test_TD etc.

That being said, "the pindel file (-p) does not exist" will just mean that either the name of the pindel file is spelled incorrectly or that the file is not in your current directory/the path is wrong.

But overall, I'd recommend you follow the instructions on the website http://gmt.genome.wustl.edu/packages/pindel/quick-start.html

First install Pindel (there may be some prompts for dependencies)

./INSTALL

Create the output directory

mkdir output

Then (stay in the main directory) run

./pindel -f demo/hs_ref_chr20.fa -p demo/COLO-829_20-p_ok.txt -c 20 -o output/ref

Finally, run Pindel2vcf:

./pindel2vcf -r demo/hs_ref_chr20.fa -R HUMAN_G1K_V2 -d 20100101 -P output/ref

You will see a ref.vcf appear in the output folder.

In general, while we try to make Pindel as easy to use as possible and try to reply as fast as possible to questions, you may want to consult with one of the (other) local bioinformaticians or scientific programmers if Pindel does not seem to work; they may see much faster what the problem is than we can from here.

On calling somatic indels... Pindel, by itself, just calls indels relative to the reference genome. You need at least two samples (and certain scripts) to find somatic indels; perhaps Kai can help here (or we can put some things on the website - I'm quite sure there are some scripts around, but Kai probably has those more readily available than I do)

Best regards,

Eric-Wubbo

liangkaiye commented 8 years ago

there is one somatic filtering script in Pindel package. We just added it recently and will provide documentation shortly.

On 10/15/15 4:03 PM, EWLameijer wrote:

Kai, Winter,

In the first mail, Winter was talking about pindel2vcf, not Pindel itself.

And for pindel2vcf, -p and -P have different meanings.

-p (lowercase p) is for an individual file, like colo_test_D. The uppercase -P can take a whole bunch of files; using "-P colo_test" processes the files colo_test_D, colo_test_SI, colo_test_TD etc.

That being said, "the pindel file (-p) does not exist" will just mean that either the name of the pindel file is spelled incorrectly or that the file is not in your current directory/the path is wrong.

But overall, I'd recommend you follow the instructions on the website http://gmt.genome.wustl.edu/packages/pindel/quick-start.html

First install Pindel (there may be some prompts for dependencies)

./INSTALL

Then (stay in the main directory) run

./pindel -f demo/hs_ref_chr20.fa -p demo/COLO-829_20-p_ok.txt -c 20 -o output/ref

Finally, run Pindel2vcf:

./pindel2vcf -r demo/hs_ref_chr20.fa -R HUMAN_G1K_V2 -d 20100101 -P output/ref

You will see a ref.vcf appear in the output folder.

In general, while we try to make Pindel as easy to use as possible and try to reply as fast as possible to questions, you may want to consult with one of the (other) local bioinformaticians or scientific programmers if Pindel does not seem to work; they may see much faster what the problem is than we can from here.

On calling somatic indels... Pindel, by itself, just calls indels relative to the reference genome. You need at least two samples (and certain scripts) to find somatic indels; perhaps Kai can help here (or we can put some things on the website - I'm quite sure there are some scripts around, but Kai probably has those more readily available than I do)

Best regards,

Eric-Wubbo

— Reply to this email directly or view it on GitHub https://github.com/genome/pindel/issues/22#issuecomment-148520745.

WinterLi1993 commented 8 years ago

@EWLameijer , Thank you very much!

yep, I prepare two samples (tumor-normal sample).And I call somatic Indels with these commands:

Step 1: Prepare normal_tumor.config.txt

simulated_sample_1.bam 250 normal simulated_sample_2.bam 250 tumor

Step 2:Call somatic Indels ( turn on "N" )

../pindel -i normal_tumor.config.txt -f simulated_reference.fa -o li -c All -r false -N true -L li.log -l false -t false

I got these results : 0 10月 16 09:23 li_TD 0 10月 16 09:23 li_INV 0 10月 16 09:23 li_LI 0 10月 16 09:23 li_BP 0 10月 16 09:23 li_CloseEndMapped 12371 10月 16 09:24 li_RP 106403 10月 16 09:24 li_D 85719 10月 16 09:24 li_SI 0 10月 16 09:24 li_INT_final 7750 10月 16 09:24 li.log

So I have these questions:

  1. RP means what ?
  2. My steps to call somatic Indels are right ?
EWLameijer commented 8 years ago

1) Hmm... Kai can answer this better than I can, but to my understanding RP means 'read pair', Kai could be more explicit about its use, though... Note, though that, insertions/deletions/tandem duplications and insertions are the data you would be more interested in anyway, so I think it is okay to ignore 'RP' for the moment, especially given your second question.

2) Your steps to call somatic indels don't seem exactly right to me, you need a separate script to process the pindel output (or pindel2vcf output), basically to look for those indels that are in the tumor sample (with a certain coverage) but not in the reference sample. You can ask Kai (I'm sure he has a script somewhere), or write it yourself or ask your local bioinformatician to write it for you (I'm currently in Cannes and far away from my home computer, so writing and testing a script would be hard for me at the moment).

Other remarks: a) usually insert size is much larger than 250, usually you should find some information about insert size with the person who made your library. If you really don't know, then 500 or 600 would be a better gamble, especially if your reads are 100-105 bp long. b) as the reference genome for pindel, always use the genome that was also used for creating the bam-file; 'simulated_reference.fa' does not seem like a proper reference (unless, of course, you are running a test on a simulated tumor sample,...)

EWLameijer commented 8 years ago

Closing this issue as half a year has gone by without further questions.