CSB5 / lofreq

LoFreq Star: Sensitive variant calling from sequencing data
http://csb5.github.io/lofreq/
Other
97 stars 30 forks source link

Adding genotype information #117

Open charlesfoster opened 2 years ago

charlesfoster commented 2 years ago

Hi, I'd like to use lofreq for my pipeline, but I require genotype information for downstream commands. I tried using lofreq2_add_sample.py like so:

./lofreq2_add_sample.py -i in.vcf.gz -o out.vcf.gz -b $BAM

However, I get the following error:

Traceback (most recent call last):
  File "./lofreq2_add_sample.py", line 312, in <module>
    main()
  File "./lofreq2_add_sample.py", line 307, in main
    add_plp_to_vcf(args.vcf_in, args.vcf_out, args.bams)
  File "./lofreq2_add_sample.py", line 229, in add_plp_to_vcf
    for row in vcf_reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

If I change the read mode for files in the add_plp_to_vcf function from 'rb' to 'r' to try and get around this error, I get:

Traceback (most recent call last):
  File "./lofreq2_add_sample.py", line 312, in <module>
    main()
  File "./lofreq2_add_sample.py", line 307, in main
    add_plp_to_vcf(args.vcf_in, args.vcf_out, args.bams)
  File "./lofreq2_add_sample.py", line 245, in add_plp_to_vcf
    vcf_writer.writerow(row)
  File "/usr/lib/python3.8/gzip.py", line 276, in write
    data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

Do you have a workaround? My python version is 3.8.6. Thanks.

charlesfoster commented 2 years ago

I ended up writing a simple bash script to add 'fake' genotype information to a VCF file, e.g. specifying that my sample is from a virus --> GT=1. This is good enough for my purposes. I can provide the script if it will be of use for anyone else, in the absence of a more robust alternative.

arunvv90 commented 1 year ago

Hi, I am little late for the party. I am having the same problem. My sample is also virus. Can you please share the script if you still have it. It can save my day. Another rquestion is, did you merge the multiple samples?

charlesfoster commented 1 year ago

Hi @arunvv90, Here's the script (with a .txt suffix so I can attach it):

add_artificial_genotype.txt

I run lofreq on individual samples, then run the attached script before potentially merging multiple samples.

arunvv90 commented 1 year ago

Thanks a lot man. I was keep on trying different things. Really appreciate your quick response. Does this script add the sample name field in the header, which is required for merging the multiple samples ?

charlesfoster commented 1 year ago

Yep, the sample name is added to the header. Previously the sample name was only guessed from the infile name, but I just added another flag to allow you to explicitly specify the name. New script attached. add_artificial_genotype.txt

The raw vcf:

image

Modified vcf after running script:

image

arunvv90 commented 1 year ago

Wow! Lightening speed!! I just tested the script and it work like charm! I was about to use bcftools reheader to change the file name. Let me test the new script for custom sample name

arunvv90 commented 1 year ago

I just tested the sample name feature also. It worked perfectly. Simple & easy solution!!! Thank you very much av724@bioram /s/a/v/s/s/l/test> bash add_artificial_genotype.sh -i BCAHV_vibi_indelq_alnq_call.vcf.gz -g 1/1 -n test_samplename -o out3.vcf.gz VCF with artificial genotype written to out3.vcf.gz av724@bioram /s/a/v/s/s/l/test> ls (npsm) add_artificial_genotype.sh* BCAHV_vibi_indelq_alnq_call.vcf.gz.tbi out3.vcf.gz.tbi BCAHV_vibi_indelq_alnq_call.vcf.gz out3.vcf.gz av724@bioram /s/a/v/s/s/l/test> bcftools query -l out3.vcf.gz (npsm) test_samplename