bmvdgeijn / WASP

WASP: allele-specific pipeline for unbiased read mapping and molecular QTL discovery
Apache License 2.0
103 stars 51 forks source link

find_intersecting_snps.py : WARNING: unable to read from file #93

Open daphniamagna opened 4 years ago

daphniamagna commented 4 years ago

Hello, I am using the mappability filtering pipeline of WASP but I am running into issues in Step 3 of the pipeline.

I produced one VCF per 'chromosome' (from an original VCF) which are contigs in our case. And I've used a BAM file produced after the GATK's SplitNCigar.

So, when I try to do the step 3, my issue is :

starting chromosome 000000F reading SNPs from file '~/WASP/chrom/000000F.snps.txt.gz' WARNING: unable to read from file '~/WASP/chrom/000000F.snps.txt.gz', assuming no SNPs for this chromosome

Is it a format issue or something like that ? Or did I miss something else ?

Thank you !

gmcvicker commented 4 years ago

Can you provide the command line that you used? It would also be helpful if you could email me a short version of your SNP input file (e.g. containing the first 1000 lines) to gmcvicker@salk.edu.

Finally, I would suggest pulling the latest changes to WASP from github if you have not already done so. There is an issue reading VCF files that are missing the header lines. This is not fixed in the latest WASP update, but the warning message is now more informative.

daphniamagna commented 4 years ago

My command line was :

python find_intersecting_snps.py \
          --is_paired_end \
          --is_sorted \
          --snp_dir ~/WASP/chrom \
          --output_dir ~/WASP/step3 \
          ~/NMP1reads_NCigar.bam

The headers are present in my VCF files so I don't know why it doesn't work.

daphniamagna commented 4 years ago

Hello, The error was the lack of compiling WASP/snp2h5. Now that has been fixed following the instructions of the installation step 5: https://github.com/bmvdgeijn/WASP However there is another error, we tried with 3 different VCF files containing 1, 2 and 3 individuals and there is always the same error: ERROR: chrom.c:124: line did not have at least 2 tokens What are those tokens ? Do we need to pre-treat somehow the VCF files ? Have a nice day

gmcvicker commented 4 years ago

Hello, I believe that you are getting the above error when you are running snp2h5. Is that correct? To me it looks like your chromInfo file (specified by the --chrom argument) is not correctly formatted. This file should contain at least 2 columns. The first column should contain the chromosome name and the second column should contain the length of the chromosome in bp. An example of a chromInfo file for the human genome is here: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/chromInfo.txt.gz

I hope this helps. Let us know if you still have any difficulties.