EichlerLab / pav

Phased assembly variant caller
98 stars 8 forks source link

missing reference? #60

Closed jeffchen2000 closed 1 month ago

jeffchen2000 commented 2 months ago

Hi Eichlerlab

I got the following error: apptainer run --bind ${PWD}:${PWD} library://becklab/pav/pav:latest -c 16

INFO: Using cached image Building DAG of jobs... MissingInputException in rule data_align_ref in line 85 of /opt/pav/rules/data.snakefile: Missing input files for rule data_align_ref: output: data/ref/ref.fa.gz, data/ref/ref.fa.gz.fai affected files: hg38_no_alt/hg38.no_alt.fa.gz

below are my environments:

$ tree . ├── assemblies │   ├── HG00733_22q12_h1.fa.gz │   ├── HG00733_22q12_h2.fa.gz │   ├── MANIFEST_20221202_PAV_Example │   ├── README_20221202_PAV_Example │   ├── assemblies.tsv │   └── config.json ├── assemblies.tsv └── config.json

$ more config.json { "reference": "hg38_no_alt/hg38.no_alt.fa.gz" }

$ more assemblies.tsv NAME HAP1 HAP2 HG00733 assemblies/HG00733_22q12_h1.fa.gz assemblies/HG00733_22q12_h2.fa.gz

paudano commented 2 months ago

It's saying it can't find hg38_no_alt/hg38.no_alt.fa.gz, which is the value of "reference" in config.json. Where is the reference file hg38_no_alt/hg38.no_alt.fa.gz located? The path in config.json should be modified to point to the reference FASTA file. If you are looking for the GRCh38_NoALT reference we used for HGSVC and don't have it, it can be found here:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2/technical/reference/20200513_hg38_NoALT/

No references are packed in PAV containers (Singularity or Docker).

jeffchen2000 commented 2 months ago

worked after adding reference--thanks