browning-lab / hap-ibd

The hap-ibd program detects identity-by-descent segments in phased genotype data.
42 stars 8 forks source link

Empty output files using a uniform recombination map #4

Closed samarth8392 closed 2 years ago

samarth8392 commented 2 years ago

Hi, I am trying to run the software with a phased VCF file using default parameters and a uniform recombination file (1cM/Mb) that looks like this:

Scate-ma1 . 0         0
Scate-ma1 . 1   1000000
Scate-ma1 . 2   2000000
Scate-ma1 . 3   3000000
...
Scate-ma1 . 335 335000000
Scate-ma1 . 336 336000000

but the job runs for 15 seconds and I get no output. The log file looks like this:

Program            :  hap-ibd.jar  [ version 1.0, 23Apr20.f1a ]
Start Time         :  07:02 PM EDT on 10 May 2022
Max Memory         :  9102 MB

Parameters
  gt               :  Scate-ma1_run1.phased.vcf
  map              :  Scate-ma1.unif.recMap
  out              :  onlyScat.Scate-ma1
  min-seed         :  2.0
  max-gap          :  1000
  min-extend       :  1.0
  min-output       :  2.0
  min-markers      :  100
  min-mac          :  2
  nthreads         :  7

Statistics
  samples          :  111
  markers          :  1974369
  IBD segments     :  0
  IBD segs/sample  :  0.0
  HBD segments     :  0
  HBD segs/sample  :  0.000

Wallclock Time:    :  15 seconds
End Time           :  07:02 PM EDT on 10 May 2022

Would you know why that might be the case? Do I need to change other parameters to get some results or is there's something I am doing wrong.

Please let me know if you need more information.

Thanks for your time and assistance.

Best, Samarth

browning-lab commented 2 years ago

Hi Samarth, An absence of detected IBD segments could be due to a high rate of haplotype error, a high rate of genotype error, or an absence of IBD segments that satisfy the length thresholds. If you are analyzing sequence data, removing low frequency markers (say MAF < 0.3) will make the analysis more robust to genotype error.

The default hap-ibd parameters were designed for human input data and an accurate genetic map. You could reduce the min-seed and min-output parameters. Reducing these parameters will increase the power to detect IBD segments, but it will also increase the false positive rate.

Best regards,

Brian

On Tue, May 10, 2022 at 4:15 PM Samarth Mathur @.***> wrote:

Hi, I am trying to run the software with a phased VCF file using default parameters and a uniform recombination file (1cM/Mb) that looks like this:

Scate-ma1 . 0 0 Scate-ma1 . 1 1000000 Scate-ma1 . 2 2000000 Scate-ma1 . 3 3000000 ... Scate-ma1 . 335 335000000 Scate-ma1 . 336 336000000

but the job runs for 15 seconds and I get no output. The log file looks like this:

Program : hap-ibd.jar [ version 1.0, 23Apr20.f1a ] Start Time : 07:02 PM EDT on 10 May 2022 Max Memory : 9102 MB

Parameters gt : Scate-ma1_run1.phased.vcf map : Scate-ma1.unif.recMap out : onlyScat.Scate-ma1 min-seed : 2.0 max-gap : 1000 min-extend : 1.0 min-output : 2.0 min-markers : 100 min-mac : 2 nthreads : 7

Statistics samples : 111 markers : 1974369 IBD segments : 0 IBD segs/sample : 0.0 HBD segments : 0 HBD segs/sample : 0.000

Wallclock Time: : 15 seconds End Time : 07:02 PM EDT on 10 May 2022

Would you know why that might be the case? Do I need to change other parameters to get some results or is there's something I am doing wrong.

Please let me know if you need more information.

Thanks for your time and assistance.

Best, Samarth

— Reply to this email directly, view it on GitHub https://github.com/browning-lab/hap-ibd/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKDWBPHSKSKQU2QCMQSLBZLVJLUYNANCNFSM5VTDEW7A . You are receiving this because you are subscribed to this thread.Message ID: @.***>

samarth8392 commented 2 years ago

Thank you Brian for a prompt reply. When I first phased the data, I ran multiple runs and estimated switch error rates. Across 111 samples in my dataset, we got ~ 1% switch error rates. However, I only filtered SNPs with MAF < 0.05, I will try thinning my SNPs and run a with different parameter sets to see if we get any results. I will keep your note about the false positives in mind as I explore more. I will reach out if I have more questions. Again, thank you for your response and to your team for creating such a user friendly software.

Thanks, Samarth