alachins / raisd

RAiSD: software to detect positive selection based on multiple signatures of a selective sweep and SNP vectors
33 stars 13 forks source link

Does the snp vcf files need a L ( SNP size ) parameter #46

Open chichizhao opened 9 months ago

chichizhao commented 9 months ago

HI! Alachins I got a problem, when I deal with my snp vcf data.

The question

When I am trying to use RAISD to detect the Sweep and positive selection sites with the population snp vcf file, which produced by the GATK pipleline, it give the following report. I have read the readme file carefully, while I still fail to deal with it. So would you please help me figure out what is the problem? many thanks to that

some information

The vcf file is kind of large ~ 6 GB without zip. It contains 111 samples. It contains 15 Chromesomes (start with Chr01) and 2 contigs ( congtig01 )

my guess

Is this file too large for handle ? Yes, it is too confused for the hint information, so I leave this communt for you. Stilling working on it, thank for you response.

best ~ chichi

the output information

RAiSD, Raised Accuracy in Sweep Detection This is version 2.9 (released in August 2020) Copyright (C) 2017, and GNU GPL'd, by Nikolaos Alachiotis and Pavlos Pavlidis Contact n.alachiotis/pavlidisp at gmail.com Command: /home/chichi/softwares/RAiSD/RAiSD -n test -I ../data/merge3_filter_variants_snp.vcf -f Samples: 111 Format: vcf var-exp: 1.0 sfs-exp: 1.0 ld-exp: 1.0

A pattern structure of 349525 patterns (max. capacity) and approx. 16 MB memory footprint has been created.

The pattern structure has been resized to 209715 patterns (max. capacity) and approx. 16 MB memory footprint.

ERROR: Wrong SNP size (L) found!

alachins commented 9 months ago

Hi chichi, RAiSD can handle this file size (I have tested it with input files of up to 65GB). The error you get means that not all SNPs in the same chromosome have the same length. RAiSD reads the expected SNP length from the very first SNP in each chromosome. Best regards, Nikos A.

On Thu, Dec 14, 2023 at 4:20 AM chichi @.***> wrote:

HI! Alachins I got a problem, when I deal with my snp vcf data. The question

When I am trying to use RAISD to detect the Sweep and positive selection sites with the population snp vcf file, which produced by the GATK pipleline, it give the following report. I have read the readme file carefully, while I still fail to deal with it. So would you please help me figure out what is the problem? many thanks to that some information

The vcf file is kind of large ~ 6 GB without zip. It contains 111 samples. It contains 15 Chromesomes (start with Chr01) and 2 contigs ( congtig01 ) my guess

Is this file too large for handle ? Yes, it is too confused for the hint information, so I leave this communt for you. Stilling working on it, thank for you response.

best ~ chichi the output information

RAiSD, Raised Accuracy in Sweep Detection This is version 2.9 (released in August 2020) Copyright (C) 2017, and GNU GPL'd, by Nikolaos Alachiotis and Pavlos Pavlidis Contact n.alachiotis/pavlidisp at gmail.com Command: /home/chichi/softwares/RAiSD/RAiSD -n test -I ../data/merge3_filter_variants_snp.vcf -f Samples: 111 Format: vcf var-exp: 1.0 sfs-exp: 1.0 ld-exp: 1.0

A pattern structure of 349525 patterns (max. capacity) and approx. 16 MB memory footprint has been created.

The pattern structure has been resized to 209715 patterns (max. capacity) and approx. 16 MB memory footprint.

ERROR: Wrong SNP size (L) found!

— Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/46, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCXOJ4VHNUT3RW3SGUTYJJV6LAVCNFSM6AAAAABAUFEH6KVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2DAOBTGA3DIMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Nikolaos Alachiotis

chichizhao commented 9 months ago

Hi ! alachins, yes, according your suggestions, I just check the vcf file, some snps like the following should be filtered(if I got it properly), as the variants have many types.

Chr01 430525 . G A,*,T 4577.88 PASS AC=15,1,1;AF=0.043,...

and for the snps, it should like the following one , which is one ref and one allels for all samples

Chr01 431994 . C T 31627.50 PASS AC=33;AF=0.176;

the file are extra for the GATK calling, my idea for the following steps is try to keep the ideal snps in the vcf file for raisd analysis. Maybe it works well. thank you! chichi Best~

chichizhao commented 9 months ago

Hi ! alachins, I am sorry it does not work on my data. here is my test data. would please check it. test_raisd.vcf.gz Best~ chichi

Command: /home/chichi/softwares/RAiSD/RAiSD -n test2 -I test_raisd.vcf -f Samples: 111 Format: vcf var-exp: 1.0 sfs-exp: 1.0 ld-exp: 1.0

A pattern structure of 349525 patterns (max. capacity) and approx. 16 MB memory footprint has been created.

The pattern structure has been resized to 209715 patterns (max. capacity) and approx. 16 MB memory footprint.

ERROR: Wrong SNP size (L) found!