freeseek / gtc2vcf

Tools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
MIT License
131 stars 22 forks source link

Question affy2vcf #33

Closed ghost closed 2 years ago

ghost commented 3 years ago

Hi Giulio,

Quick question, I see affy2vcf can convert cel to chp and chp to vcf. I am just wondering if this is required to do two steps to get from cel to vcf? I don't see in description requiring this and I know PennCNV goes from cel to vcf but requires multiple steps. Let me know whether we can go straight from CEL to VCF. Thanks. Brian

ghost commented 3 years ago

Sorry, I'll add what I tried. This code just indicated Reading AGCC file /Users/brian/Downloads/GW_SNP5_AGCC_CEL_1/NA07022_Op1_011206_VnV_A04_r1.CEL. Then no vcf file is being output. Just that is read the CEL files. I see now that there is an error: AGCC file /Users/brian/Downloads/GW_SNP5_AGCC_CEL_1/NA06985_Op1_011206_VnV_D10_r1.CEL does not contain multi data type analysis as and thats it, there's not after new line that is print but there should be something for agcc[i]->fn? right? Do these files not have multi data type? I see in header they have affymetrix-calvin-intensity.

bcftools +affy2vcf \
  --no-version -Ou \
  --fasta-ref $ref \
  --calls /Users/brian/output/brlmm-p.calls.txt \
  --confidences /Users/brian/output/brlmm-p.confidences.txt \
  --cel /Users/brian/Downloads/GW_SNP5_AGCC_CEL_1/*.CEL

The reason I didn't include the csv manifest file is because the code does not allow a csv_file && calls or confidence files to be passed. Maybe there should be a note in the help for this?

freeseek commented 3 years ago

+affy2vcf can read the content of CEL and CHP files and can convert CHP files to VCFs (provided a .snp-posteriors.txt file is also provided). The ability to read CEL files is merely to understand what array was used to generate them and when the array was scanned. The error does not contain multi data type analysis as most likely means that you are trying to read a CEL file as if it was a CHP file. You will still need to call genotypes using the APT suite. I do warn you though that for non-Axiom arrays, only the SNP 6.0 array is currently supported, as APT does not compute bi-dimensional cluster centers for previous arrays, and for Axiom arrays I don't know how you can generate CHP files using the APT suite. For Axiom arrays currently the only option I know is to create .calls.txt/.summary.txt/.snp-posteriors.txt table files from CEL files and input these into +affy2vcf. I will add a section to make this more clear but for now check here and replace --chps with --calls, --confidences, and --summary (CHP files are binary files that contain the same data as the latter three tables). Also check the examples provided when running bcftools +affy2vcf without any option.