IHGGM-Aachen / CNVand

A snakemake workflow to analyse and annotate copy number variants
MIT License
3 stars 0 forks source link

AnnotSV Cannot Find Reference Files #1

Closed CarlosClassen closed 1 month ago

CarlosClassen commented 1 month ago

Description

While running the AnnotSV step in our Snakemake workflow, the process fails with an error indicating that it cannot find the reference file refGene.txt.gz. Below is the specific error message:

/tmp/Annotations_Human/Genes/GRCh38/refGene.txt.gz: No such file or directory

Snakemake Rule

Here is the Snakemake rule that triggers the error:

rule annotsv:
    input:
        os.path.join(config['outdir'], 'cnv', '{sample}', '{sample}_cnv.vcf')
    output:
        os.path.join(config['outdir'], 'annotsv', '{sample}.annotated.tsv')
    log: 
        os.path.join(config['outdir'], 'logs', 'annotsv', '{sample}.log')
    params:
        refGene="config['params']['annotsv']['refGene']",
        extra=config['params']['annotsv']['extra']
    conda:
        "../envs/annotsv.yaml"
    shell:
        """
        wget -O /tmp/Annotations_Human/Genes/GRCh38/refGene.txt.gz {params.refGene} > {log} 2>&1
        AnnotSV -SVinputFile {input} -annotationsDir /tmp -genomeBuild GRCh38 -outputFile {output} >> {log} 2>&1
        """

Steps to Reproduce

  1. Run the Snakemake pipeline with the annotsv rule.
  2. Observe the error message in the log file indicating that refGene.txt.gz is not found.

Expected Behavior

The wget command should download the refGene.txt.gz file to the specified location, and AnnotSV should be able to access it without any issues.

Actual Behavior

The wget command either fails to download the file or places it in an incorrect directory, causing AnnotSV to fail due to the missing reference file.

Possible Solutions

  1. Verify the URL specified in config['params']['annotsv']['refGene'] is correct and accessible.
  2. Ensure that the directory /tmp/Annotations_Human/Genes/GRCh38/ exists before running the wget command.
  3. Check for any network issues or restrictions that might prevent wget from downloading the file.

To make this pipeline executable behind a firewall in production environments, we should remove the downloading of the reference file and instead do a local copy of the file to the destination AnnotSV expects it to be or figure out how to give the correct path as a parameter

CarlosClassen commented 1 month ago

Despite efforts to use the conda packaged version of AnnotSV with the parameter -annotationsDir, it's not able to find the refGene.txt list.

Maybe it's better to use a containerized version and mount it with singularity arguments during snakemake invocation.

CarlosClassen commented 1 month ago

Seems like just the refGene.txt list was not enough to make AnnotSV work. Figured it out now; to download the full annotation files - as given in the Makefile in the AnnotSV repository - I used this link.

Still, to avoid downloading this still large reference files, I'd suggest to just include a manual in the README.md for the users to download them by themselves and give a path option in the configfile.

With this now clear, we can also stick to the conda version of the tool, avoiding the user to be able to run singularity containers.