faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
80 stars 49 forks source link

Missing input file when doing a workflow mapping #246

Closed sadbirder closed 3 years ago

sadbirder commented 3 years ago

Hello Dr. Faircloth.

I am getting the following error when trying to do a mapping with the tutorial UCE data, using the following code:

phyluce_workflow --config ~/ucetutorial/config.yml --output ~/ucetutorial/workflow/ --workflow mapping --cores 1

Here's the output message that I got.

Building DAG of jobs... MissingInputException in line 39 of /home/lucatristaom/anaconda2/envs/phyluce-1.7.1/phyluce/workflows/mapping/Snakefile: Missing input files for rule copy_and_build_index: /home/ucetutorial/spades-assemblies/contigs/alligator_mississippiensis.contigs.fasta

I was thinking this was due to the way I wrote the yaml config file, but since I strictly followed your tutorial and I am new to phyluce and bioinformatics in general, I am unable to find out what exactly I was doing wrong. Here's what I wrote in the config file:

`reads: alligator-mississippiensis: /home/ucetutorial/clean-fastq/alligator_mississippiensis/ gallus-gallus: /home/ucetutorial/clean-fastq/gallus_gallus/ anolis-carolinensis: /home/ucetutorial/clean-fastq/anolis_carolinensis/ mus-musculus: /home/ucetutorial/clean-fastq/mus_musculus/

contigs: alligator-mississippiensis: /home/ucetutorial/spades-assemblies/contigs/alligator_mississippiensis.contigs.fasta gallus-gallus: /home/ucetutorial/spades-assemblies/contigs/gallus_gallus.contigs.fasta anolis-carolinensis: /home/ucetutorial/spades-assemblies/contigs/anolis_carolinensis.contigs.fasta mus-musculus: /home/ucetutorial/spades-assemblies/contigs/mus_musculus.contigs.fasta`

Thanks in advance. Best regards.

brantfaircloth commented 3 years ago

First, reduce everything to just the alligator example - it will just simplify. Then, check the paths to the reads and the contigs, and make sure those exist. The reads folder in the yaml file for alligator should contain:

alligator-mississippiensis-READ1.fastq.gz
alligator-mississippiensis-READ2.fastq.gz

Named exactly the same as above.

sadbirder commented 3 years ago

Hello again Dr. Faircloth, thanks for your quick response.

I was able to run the mapping workflow without the alligator mississippiensis reads sample, as it appears to have been corrupted somehow. I think so because when I looked into the clean-fastq > raw-reads directories for the alligator sample, it only had 99 bytes, compared to the other reads which have on average 5MB.

Prior to that, I was only able to run the --workflow mapping with the rest of the samples when the reads callings in the yaml config file had the same syntax structure, so for example, the anolis carolinensis clean-fastq > raw-reads data needed to be named exactly as the anolis carolinensis calling in the yaml file (with hyphens, instead of underscores) which I missed when trying your tutorial. Also, my pathing was incorrect, so I got that issued as well.

Again, thank you for your time, this has been a great help. Best.

brantfaircloth commented 3 years ago

You're welcome 👍