Open TakashiKoyama opened 3 years ago
I think
vcfR_to_fasta(paste0("temp/sel_", as.integer(ix), "_", as.integer(ex), ".recode.vcf"), ref = ref, start = as.integer(ix), fa_start = fa_start, fa_end = fa_end, attr_name = attr_name)
in vcf_statistics.R
and
seqinr::write.fasta(sequences = ref[[attr_name]][fa_start:fa_end], names = names(ref), nbchar = 80, file.out = paste0("temp/", fa_start, "_", fa_end, "_out.fa"))
in vcfR_to_fasta.R
are the causes.
Can anyone fix these?
Hello @TakashiKoyama, thank you for submitting the issue. I will take a look into it and keep you updated.
It seems that the issue is with your reference FASTA sequence, would you be so kind to provide me with a snippet of how your reference sequence looks like?
Thank you for your reply @fardokhtsadat . I attached the sequence I focused in my analysis. If you need entire chromosome sequence, please let me know. chr12.ld.txt
Dear @TakashiKoyama, the reason for the issue seems to be that your FASTA header starts with 'chr', which is not supported with the current design of LDJump.
It should work once you remove the 'chr' from the header to have the header start with the chromosome number, like: >12:13482645-14021341
You can run the following command to remove 'chr' from all headers using: sed -i -e 's/chr//g' chr12.ld.txt
Can you try and see whether it works? Thank you.
Best regards, Fardokht
Should I change the chromosome name in the vcf.gz too?
You can try running it first. VCF files usually follow the expected naming convention, but to be sure can you provide me with a snippet of your VCF-file?
OK. head500.vcf.txt This is first 500 lines of my VCF.
It seems that your VCF file also follows the same naming convention. Try changing the Chromosome Name in the VCF file using the same command, and let me know whether it works.
I changed chromosome names in both my reference and vcf but still have another error.
Error in ref[[attr_name]] : recursive indexing failed at level 2
In addition: Warning message:
closing unused connection 3 (temp/1_1000_out.fa)
Can I share whole reference and vcf with you? It might be better to reproduce my problem.
Sure
Thanks! I will send my files to fardokht.fm@gmail.com.
The files are too large to sent you. Do you have file transfer service you usually use?
You can send me a small subset, it does not have to be the whole sequence. The first 2,000 base pairs are enough.
OK. These are first 100 lines of my reference and vcf. head100.vcf.txt ref_head100.fna.txt
Dear @TakashiKoyama, thank you for providing the subsets.
There are two remarks I can make at this stage:
bcftools annotate —set-id +'%CHROM:%POS:%REF:%FIRST_ALT' your_vcf_file.vcf > your_vcf_withreplacedID.vcf
This command will create a unique ID for every entry in your VCF-file. Please try to edit your VCF file and see whether it successfully runs.Thank you for your quick reply.
Unfortunately, I still have error even if I edit my VCF.
> VCF="allbam.BQSR3.filtered.snp.PASS_only.renameChr.vcf.gz"
> REF="GCA_002217815.1_Squ_2.0_chromosome.oriented.renHeader.sed.fna"
> CHR=12
> SeqStart=13482645
> SeqEnd=14021341
> LengthOfSeq=SeqEnd - SeqStart + 1
> LDhatPATH= "/home/samba/Suijitsu_Public/software/LDhat/"
> PhiPATH="/home/samba/Suijitsu_Public/software/PhiPack/Phi"
> LDJump(VCF, chr = CHR, cores = 30, pathLDhat = LDhatPAT,
+ pathPhi = PhiPATH, format = "vcf", refName = REF,
+ lengthofseq = LengthOfSeq, startofseq = SeqStart, endofseq = SeqEnd)
In your working directory, there already exists a folder named "temp". Are you willing to delete it?
1: Yes
2: No
Selection: 1
VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--gzvcf allbam.BQSR3.filtered.snp.PASS_only.renameChr.vcf.gz
--chr 12
--to-bp 13483645
--recode-INFO-all
--out temp/sel_13482645_13483645
--recode
--from-bp 13482645
Using zlib version: 1.2.7
After filtering, kept 16 out of 16 Individuals
Outputting VCF file...
After filtering, kept 6 out of a possible 8398823 Sites
Run Time = 29.00 seconds
Scanning file to determine attributes.
File attributes:
meta lines: 72
header_line: 73
variant count: 6
column count: 25
Meta line 72 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
Character matrix gt rows: 6
Character matrix gt cols: 25
skip: 0
nrows: 6
row_num: 0
Processed variant: 6
All variants processed
Error in ref[[attr_name]] : recursive indexing failed at level 2
In addition: Warning message:
In for (i in seq_len(nseq)) { :
closing unused connection 3 (temp/1_1000_out.fa)
I attach my edited VCF here. Sorry for making a lot of troubles. head100.vcf.txt
Hi TakashiKoyama, Could you solve the issue? I am getting the same error even with phased vcf file containing the IDs, sounds like something else is the root of the issue.
Hi,
I have a following error when running LDJump. How can I solve this?