Open chichizhao opened 9 months ago
Hi chichi, RAiSD can handle this file size (I have tested it with input files of up to 65GB). The error you get means that not all SNPs in the same chromosome have the same length. RAiSD reads the expected SNP length from the very first SNP in each chromosome. Best regards, Nikos A.
On Thu, Dec 14, 2023 at 4:20 AM chichi @.***> wrote:
HI! Alachins I got a problem, when I deal with my snp vcf data. The question
When I am trying to use RAISD to detect the Sweep and positive selection sites with the population snp vcf file, which produced by the GATK pipleline, it give the following report. I have read the readme file carefully, while I still fail to deal with it. So would you please help me figure out what is the problem? many thanks to that some information
The vcf file is kind of large ~ 6 GB without zip. It contains 111 samples. It contains 15 Chromesomes (start with Chr01) and 2 contigs ( congtig01 ) my guess
Is this file too large for handle ? Yes, it is too confused for the hint information, so I leave this communt for you. Stilling working on it, thank for you response.
best ~ chichi the output information
RAiSD, Raised Accuracy in Sweep Detection This is version 2.9 (released in August 2020) Copyright (C) 2017, and GNU GPL'd, by Nikolaos Alachiotis and Pavlos Pavlidis Contact n.alachiotis/pavlidisp at gmail.com Command: /home/chichi/softwares/RAiSD/RAiSD -n test -I ../data/merge3_filter_variants_snp.vcf -f Samples: 111 Format: vcf var-exp: 1.0 sfs-exp: 1.0 ld-exp: 1.0
A pattern structure of 349525 patterns (max. capacity) and approx. 16 MB memory footprint has been created.
The pattern structure has been resized to 209715 patterns (max. capacity) and approx. 16 MB memory footprint.
ERROR: Wrong SNP size (L) found!
— Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/46, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCXOJ4VHNUT3RW3SGUTYJJV6LAVCNFSM6AAAAABAUFEH6KVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2DAOBTGA3DIMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Nikolaos Alachiotis
Hi ! alachins, yes, according your suggestions, I just check the vcf file, some snps like the following should be filtered(if I got it properly), as the variants have many types.
Chr01 430525 . G A,*,T 4577.88 PASS AC=15,1,1;AF=0.043,...
and for the snps, it should like the following one , which is one ref and one allels for all samples
Chr01 431994 . C T 31627.50 PASS AC=33;AF=0.176;
the file are extra for the GATK calling, my idea for the following steps is try to keep the ideal snps in the vcf file for raisd analysis. Maybe it works well. thank you! chichi Best~
Hi ! alachins, I am sorry it does not work on my data. here is my test data. would please check it. test_raisd.vcf.gz Best~ chichi
Command: /home/chichi/softwares/RAiSD/RAiSD -n test2 -I test_raisd.vcf -f Samples: 111 Format: vcf var-exp: 1.0 sfs-exp: 1.0 ld-exp: 1.0
A pattern structure of 349525 patterns (max. capacity) and approx. 16 MB memory footprint has been created.
The pattern structure has been resized to 209715 patterns (max. capacity) and approx. 16 MB memory footprint.
ERROR: Wrong SNP size (L) found!
HI! Alachins I got a problem, when I deal with my snp vcf data.
The question
When I am trying to use RAISD to detect the Sweep and positive selection sites with the population snp vcf file, which produced by the GATK pipleline, it give the following report. I have read the readme file carefully, while I still fail to deal with it. So would you please help me figure out what is the problem? many thanks to that
some information
The vcf file is kind of large ~ 6 GB without zip. It contains 111 samples. It contains 15 Chromesomes (start with Chr01) and 2 contigs ( congtig01 )
my guess
Is this file too large for handle ? Yes, it is too confused for the hint information, so I leave this communt for you. Stilling working on it, thank for you response.
best ~ chichi
the output information