eblerjana / pangenie

Pangenome-based genome inference
MIT License
103 stars 10 forks source link

The different sample with different lines in vcf #70

Open ld9866 opened 6 months ago

ld9866 commented 6 months ago

Dear developer: Recently, we have been using the Panegnie classification to genotype 300 different individuals, and the result shows that the number of VCF lines is between 4.57 million and 4.7 million. I want to ask whether this meets the requirements of our subsequent analysis. No error is reported in the log file, but some variation information may be lost in some sample's vcf results.

Best day!

eblerjana commented 6 months ago

Can you please share more details on the command line used to run PanGenie ? The number of variants in the output VCF is the same as in the input VCF provided to PanGenie with option -v. Also, can you share the log?

ld9866 commented 6 months ago

Our code is like "PanGenie -f Reference -i <(zcat sample1.1.clean.fq.gz sample1.2.clean.fq.gz) -s sample1 -o sample1 -j 30 -t 30" in some local machine, but some of them are in the Computing cluster and the fastq.gz is translate to a one fastq which like "PanGenie -f Reference -i sample1.fastq -s sample1 -o sample1 -j 30 -t 30". I found the sample _histogram.histo is different in these two method , which means that Decompressing fastq.gz and zcat compression formats are completely different data results, I uploaded this file for your help to check.

This two method histo file in here: 1.fastq.sample1_histogram.histo.txt 2.zcat.fastq.sample1.histogram.histo.txt

ld9866 commented 6 months ago

Do you remember when I asked earlier about missing parts of the middle of the chromosome? https://github.com/eblerjana/pangenie/issues/67 After I used vcfhub to quality control our results, all the vcf lines were the same, but I did not use vcfhub to do unnecessary operations this time according to your recommended opinion. I guess this may be due to the difference between whether fastq was decompressed and sequencing depth, what do you think?