freeseek / score

Tools to work with GWAS-VCF summary statistics files
MIT License
94 stars 6 forks source link

Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele #14

Open bbimber opened 4 weeks ago

bbimber commented 4 weeks ago

Hello,

I am trying to run liftover on a moderate size (500K PacBio-derived SVs) to lift data from the macaque genome to GRCh37. I am running the command below, but it is reporting:

Warning: as option --src-fasta-ref is missing it is impossible to infer which allele is the reference allele at position 1:248821847

however, I think I am providing "-s" to the command. Is there a different argument I am missing, or is the tool expecting some other kind of input? Thanks for any ideas.

    bcftools +liftover \
        --no-version \
        -Ou \
        --threads 18 \
        -o $VCF_LIFTED \
        $VCF_NORM \
        -- \
        -s <MMul10_Genome_Fasta> \
        -f <GRCH37_FASTA> \
        -c $CHAIN_FILE \
        --reject $UNMAPPED \
        --reject-type z \
        --write-src \
        --fix-tags

Also, I would not have thought 500K SVs is that big a dataset, but this has been running for days (even with 18 threads), which seems rather extreme.

freeseek commented 3 weeks ago

This could be a mistake in the plugin in identifying which variants are symbolic variants. Can you share with me an example that reproduces the issue? Also, using multiple threads will only affect compression/decompression so most likely you don't need to use that many threads. The liftover step is not multi-threaded. It should run in a few seconds. If it is taking so long it means you found a bug in the code

bbimber commented 3 weeks ago

@freeseek: If you're willing to have a look, i'm happy to share any of this. the input file is ~200mb; however, it repros the warning quite quickly. It also basically hangs after starting the tool, with no obvious work happing (nothing is being written). If I posted the file would you consider this, or do you want a more minimal input?

freeseek commented 3 weeks ago

That's perfect. If you could send me by email a link to the VCF and a link to the MMul10_Genome_Fasta file that would be great