PhHermann / LDJump

R Package to Estimate Variable Recombination Rates using Population Genetic Data
MIT License
39 stars 6 forks source link

recursive indexing failed at level 2 #28

Open TakashiKoyama opened 3 years ago

TakashiKoyama commented 3 years ago

Hi,

I have a following error when running LDJump. How can I solve this?

>LDJump(VCF, chr = CHR, cores = 10, pathLDhat = "/home/samba/Suijitsu_Public/software/LDhat/", pathPhi = "/home/samba/Suijitsu_Public/software/PhiPack/", format = "vcf", refName = REF, lengthofseq = LengthOfSeq, startofseq = SeqStart, endofseq = SeqEnd)

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --gzvcf chr12ZW.allbam.BQSR3.filtered.snp.PASS_only.vcf.gz
    --chr chr12
    --to-bp 13483645
    --recode-INFO-all
    --out temp/sel_13482645_13483645
    --recode
    --from-bp 13482645

Using zlib version: 1.2.7
After filtering, kept 10 out of 10 Individuals
Outputting VCF file...
After filtering, kept 6 out of a possible 271740 Sites
Run Time = 1.00 seconds
Scanning file to determine attributes.
File attributes:
  meta lines: 72
  header_line: 73
  variant count: 6
  column count: 19
Meta line 72 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
  Character matrix gt rows: 6
  Character matrix gt cols: 19
  skip: 0
  nrows: 6
  row_num: 0
Processed variant: 6
All variants processed
Error in ref[[attr_name]] : recursive indexing failed at level 2
In addition: Warning message:
In as.integer(stop) : closing unused connection 3 (temp/1_1000_out.fa)
TakashiKoyama commented 3 years ago

I think

vcfR_to_fasta(paste0("temp/sel_", as.integer(ix), "_", as.integer(ex), ".recode.vcf"), ref = ref, start = as.integer(ix), fa_start = fa_start, fa_end = fa_end, attr_name = attr_name)

in vcf_statistics.R and

seqinr::write.fasta(sequences = ref[[attr_name]][fa_start:fa_end], names = names(ref), nbchar = 80, file.out = paste0("temp/", fa_start, "_", fa_end, "_out.fa"))

in vcfR_to_fasta.R are the causes.

Can anyone fix these?

fardokhtsadat commented 3 years ago

Hello @TakashiKoyama, thank you for submitting the issue. I will take a look into it and keep you updated.

It seems that the issue is with your reference FASTA sequence, would you be so kind to provide me with a snippet of how your reference sequence looks like?

TakashiKoyama commented 3 years ago

Thank you for your reply @fardokhtsadat . I attached the sequence I focused in my analysis. If you need entire chromosome sequence, please let me know. chr12.ld.txt

fardokhtsadat commented 3 years ago

Dear @TakashiKoyama, the reason for the issue seems to be that your FASTA header starts with 'chr', which is not supported with the current design of LDJump.

It should work once you remove the 'chr' from the header to have the header start with the chromosome number, like: >12:13482645-14021341

You can run the following command to remove 'chr' from all headers using: sed -i -e 's/chr//g' chr12.ld.txt

Can you try and see whether it works? Thank you.

Best regards, Fardokht

TakashiKoyama commented 3 years ago

Should I change the chromosome name in the vcf.gz too?

fardokhtsadat commented 3 years ago

You can try running it first. VCF files usually follow the expected naming convention, but to be sure can you provide me with a snippet of your VCF-file?

TakashiKoyama commented 3 years ago

OK. head500.vcf.txt This is first 500 lines of my VCF.

fardokhtsadat commented 3 years ago

It seems that your VCF file also follows the same naming convention. Try changing the Chromosome Name in the VCF file using the same command, and let me know whether it works.

TakashiKoyama commented 3 years ago

I changed chromosome names in both my reference and vcf but still have another error.

Error in ref[[attr_name]] : recursive indexing failed at level 2
In addition: Warning message:
closing unused connection 3 (temp/1_1000_out.fa) 

Can I share whole reference and vcf with you? It might be better to reproduce my problem.

fardokhtsadat commented 3 years ago

Sure

TakashiKoyama commented 3 years ago

Thanks! I will send my files to fardokht.fm@gmail.com.

TakashiKoyama commented 3 years ago

The files are too large to sent you. Do you have file transfer service you usually use?

fardokhtsadat commented 3 years ago

You can send me a small subset, it does not have to be the whole sequence. The first 2,000 base pairs are enough.

TakashiKoyama commented 3 years ago

OK. These are first 100 lines of my reference and vcf. head100.vcf.txt ref_head100.fna.txt

fardokhtsadat commented 3 years ago

Dear @TakashiKoyama, thank you for providing the subsets.

There are two remarks I can make at this stage:

  1. I can run LDJump using your files if I fix the ID columns in your VCF file. If you take a close look on the ID columns, you will find that all entries have the ID column set to '.'. You can fix this issue by running this command: bcftools annotate —set-id +'%CHROM:%POS:%REF:%FIRST_ALT' your_vcf_file.vcf > your_vcf_withreplacedID.vcf This command will create a unique ID for every entry in your VCF-file. Please try to edit your VCF file and see whether it successfully runs.
  2. It seems that your VCF data is unphased, is that correct? If yes, I have to get into contact with my supervisor, as LDJump's models were trained on phased data.
TakashiKoyama commented 3 years ago

Thank you for your quick reply.

  1. I will edit my VCF as you suggested.
  2. Yes, my VCF is unphased.
TakashiKoyama commented 3 years ago

Unfortunately, I still have error even if I edit my VCF.

> VCF="allbam.BQSR3.filtered.snp.PASS_only.renameChr.vcf.gz"
> REF="GCA_002217815.1_Squ_2.0_chromosome.oriented.renHeader.sed.fna"
> CHR=12
> SeqStart=13482645
> SeqEnd=14021341
> LengthOfSeq=SeqEnd - SeqStart + 1
> LDhatPATH= "/home/samba/Suijitsu_Public/software/LDhat/"
> PhiPATH="/home/samba/Suijitsu_Public/software/PhiPack/Phi"
> LDJump(VCF, chr = CHR, cores = 30, pathLDhat = LDhatPAT,
+        pathPhi = PhiPATH, format = "vcf", refName = REF, 
+        lengthofseq = LengthOfSeq, startofseq = SeqStart, endofseq = SeqEnd)
In your working directory, there already exists a folder named "temp". Are you willing to delete it? 

1: Yes
2: No

Selection: 1

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --gzvcf allbam.BQSR3.filtered.snp.PASS_only.renameChr.vcf.gz
    --chr 12
    --to-bp 13483645
    --recode-INFO-all
    --out temp/sel_13482645_13483645
    --recode
    --from-bp 13482645

Using zlib version: 1.2.7
After filtering, kept 16 out of 16 Individuals
Outputting VCF file...
After filtering, kept 6 out of a possible 8398823 Sites
Run Time = 29.00 seconds
Scanning file to determine attributes.
File attributes:
  meta lines: 72
  header_line: 73
  variant count: 6
  column count: 25
Meta line 72 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
  Character matrix gt rows: 6
  Character matrix gt cols: 25
  skip: 0
  nrows: 6
  row_num: 0
Processed variant: 6
All variants processed
Error in ref[[attr_name]] : recursive indexing failed at level 2
In addition: Warning message:
In for (i in seq_len(nseq)) { :
  closing unused connection 3 (temp/1_1000_out.fa)

I attach my edited VCF here. Sorry for making a lot of troubles. head100.vcf.txt

yrahimi commented 1 year ago

Hi TakashiKoyama, Could you solve the issue? I am getting the same error even with phased vcf file containing the IDs, sounds like something else is the root of the issue.