bcgsc / straglr

Tandem repeat expansion detection or genotyping from long-read alignments
Other
50 stars 9 forks source link

Problem with straglr_compare #26

Closed gspirito closed 8 months ago

gspirito commented 9 months ago

Hi, I would like to analyze WGS long-reads data of a trio (proband, mother and father) with straglr, but I encountered some problems with straglr_compare.

First I run straglr on the 3 bam files (aligned with minimap2) like so:

python3 straglr.py sample.bam ref.fasta sample_ID \
--exclude hg38_centromeres_telomeres_UCSC.bed \
--min_support 1 --min_ins_size 10 \

and got no errors or warnings.

Then I run 3 instances of straglr_compare:

The first time I ran: python3 ./straglr_compare.py proband.tsv mother.tsv proband_vs_mother.txt and got the proper result

Then I did: python3 ./straglr_compare.py proband.tsv father.tsv proband_vs_father.txt and got this error:

Traceback (most recent call last):
  File "/work/gspirito/software/straglr-master/straglr_compare.py", line 371, in <module>
    main()
  File "/work/gspirito/software/straglr-master/straglr_compare.py", line 359, in main
    vs_controls.append(vs_each_control(test_bed, control_bed, args.pval_cutoff, min_expansion=args.min_expansion, min_support=args.min_support, label=control_result))
  File "/work/gspirito/software/straglr-master/straglr_compare.py", line 147, in vs_each_control
    control_allele = float(control_cols[j-1])
ValueError: could not convert string to float: '.'

Finally I ran: python3 ./straglr_compare.py father.tsv mother.tsv father_vs_mother.txt and got no errors but no output file either.

Am I doing something wrong? Is there a way to perform a 'proband vs mother+father' comparison?

Thanks in advance for the response.

Giovanni

readmanchiu commented 8 months ago

Thanks for your interest in the software, @gspirito Just wonder if you are using the latest release v1.4.1 or if you clone the repo and use the straglr_compare.py from the clone. If you are using from your installation from v1.4.1, could you try to clone the repo and used the script from there? I've made a few fixes to the script which I haven't put into a release yet.

gspirito commented 8 months ago

Hi, thank you for the response.

I tried both,

this is the error i get with release 1.4.1:

  File "/work/gspirito/software/straglr/straglr_compare.py", line 374, in <module>
    main()
  File "/work/gspirito/software/straglr/straglr_compare.py", line 356, in main
    test_bed = parse_straglr_tsv(args.test, use_size=args.use_size, skip_chroms=args.skip_chroms, no_strand_version=args.no_strand_version)
  File "/work/gspirito/software/straglr/straglr_compare.py", line 65, in parse_straglr_tsv
    allele = '{:.1f}'.format(float(allele) / len(cols[3]))

this with cloning the repo:

Traceback (most recent call last):
  File "/work/gspirito/software/straglr-master/straglr_compare.py", line 371, in <module>
    main()
  File "/work/gspirito/software/straglr-master/straglr_compare.py", line 359, in main
    vs_controls.append(vs_each_control(test_bed, control_bed, args.pval_cutoff, min_expansion=args.min_expansion, min_support=args.min_support, label=control_result))
  File "/work/gspirito/software/straglr-master/straglr_compare.py", line 147, in vs_each_control
    control_allele = float(control_cols[j-1])
ValueError: could not convert string to float: '.'
readmanchiu commented 8 months ago

kind of puzzling because there is a check in the code to make sure the above error won't happen just want to make sure if your straglr output has the correct format, could you please run:

grep chrom <your_starglr_output> | awk '{print NF}'

to see if you have 15 columns in your output

readmanchiu commented 8 months ago

hmmm...father.tsv, mother.tsv, and proband.tsv should be the tsv outputs of straglr, while proband_vs_parent.tsv should be the output of straglr_compare.py what you showed in how you ran straglr_compare.py looks correct, but the lines showing after you head father.tsv looks like the output of straglr_compare.py instead of straglr.py, sorry I'm confused...

I am just wondering if your straglr tsv outputs have 15 columns, not the outputs of straglr_compare.py

also I would advise at least 2 for --min_support when you ran straglr.py

readmanchiu commented 8 months ago

Hi @gspirito, have you been able to resolve this?

readmanchiu commented 8 months ago

I assumed the issue is resolved and will close the ticket. Let me know if there is further issues.