bcgsc / abyss

:microscope: Assemble large genomes using short reads
http://www.bcgsc.ca/platform/bioinfo/software/abyss
Other
310 stars 107 forks source link

abyss-pe not working when put within a loop #356

Closed Nicholus-ledbetter closed 3 years ago

Nicholus-ledbetter commented 3 years ago

Im having trouble running abyss-pe within a simple loop. When I run the command outside of a loop it works fine, but when i try to loop through multiple directories to assemble paired end fastq files it stops prematurely and does not produce a -contigs.fa file. I appreciate any help.

Please report

Assembly error

lcoombe commented 3 years ago

Hi @Nicholus-ledbetter,

Could you post the command that did work for you? Outside the loop? I'd suggest trying to avoid using wildcards in your for-loop, and also using a simpler prefix name (ex. just RMB2499 or similar if that is enough to distinguish the runs from one another.

It would also be worth giving the loop a test with the dry-run option (-n with abyss-pe) just to double check that the commands are looking OK. For example, I can see that in the abyss-map command, there are no reads supplied to the command, which would be needed for aligning the reads to the unitigs.

Lauren

sjackman commented 3 years ago

in='*R1_001.fastq *R2_001.fastq' I could be wrong, but I don't believe using wild cards in in works.

Nicholus-ledbetter commented 3 years ago

Thanks for the help guys! Here is the command that does work: abyss-pe name=x k=96 in='R1_001.fastq R2_001.fastq' From what I can tell using wild card in "in=" works just fine, the problem seems to arise when I use a variable in the "name" parameter. When I put this in a loop and give every output the same name, then the loop works fine.

sjackman commented 3 years ago

Glad to hear that you got it working. I believe name needs to be a valid Makefile variable name so only alphanumeric characters and underscore.

fpezzini commented 1 year ago

Hello, coming back to this as I am having trouble trying to run abyss 2.3.5 in a loop with pair reads, it seems to be reading sample names in some parts and other not. An example with only a single sample:

while read f; do abyss-pe name="$f" k=90 B=2G in='"$f".R1.fq.gz "$f".R2.fq.gz'; done < acc

in which cat acc = sample1 and files in the folder are sample1.R1.fq.gz and sample1.R2.fq.gz

gives me

abyss-stack-size 65536 abyss-bloom-dbg -k90 -q3  -b2G -j1  "".R1.fq.gz "".R2.fq.gz  > sample1-1.fa
Running with max stack size of 65536 KB: abyss-bloom-dbg -k90 -q3 -b2G -j1 .R1.fq.gz .R2.fq.gz
error: `.R1.fq.gz': No such file or directory
make: *** [/mnt/shared/scratch/fpezzini/apps/conda/bin/abyss-pe.Makefile:555: sample1-1.fa] Error 1
make: *** Deleting file 'sample1-1.fa'

Appreciate any advice (tried to vary quotes already). How to I pass paired-end reads as input for running in array (as well as in a loop)?

Dry-run gives me:

abyss-stack-size 65536 abyss-bloom-dbg -k90 -q3  -b2G -j1  "".R1.fq.gz "".R2.fq.gz  > sample1-1.fa
AdjList    -k90 -m50 --dot sample1-1.fa >sample1-1.dot
abyss-rresolver-short  -b2G -f0.8 -j1 -k90 \
 -h sample1-1-rr --dot -c sample1-1-rr.fa \
-g sample1-1-rr.dot sample1-1.fa sample1-1.dot "".R1.fq.gz "".R2.fq.gz
abyss-filtergraph  --dot   -k90 -g sample1-2.dot1 sample1-1-rr.dot sample1-1-rr.fa >sample1-1-rr.path
MergeContigs --dot   -k90 -g sample1-2.dot -o sample1-2.fa sample1-1-rr.fa sample1-2.dot1 sample1-1-rr.path
PopBubbles  --dot -j1 -k90  -p0.9  -g sample1-3.dot sample1-2.fa sample1-2.dot >sample1-2.path
MergeContigs   -k90 -o sample1-3.fa sample1-2.fa sample1-2.dot sample1-2.path
awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}' \
    sample1-2.path sample1-1-rr.fa >sample1-indel.fa
ln -sf sample1-3.fa sample1-unitigs.fa
abyss-map   -j1 -l40    "".R1.fq.gz "".R2.fq.gz sample1-3.fa \
    |abyss-fixmate   -l40  -h sample1-3.hist \
    |sort -snk3 -k4 \
    |DistanceEst   -j1 -k90  -l40 -s1000 -n10  -o sample1-3.dist sample1-3.hist
Overlap  --dot   -k90 -g sample1-4.dot -o sample1-4.fa sample1-3.fa sample1-3.dot sample1-3.dist
abyss-stack-size 65536 SimpleGraph   -s1000 -n10  -j1 -k90 -o sample1-4.path1 sample1-4.dot sample1-3.dist
abyss-index  --fai sample1-3.fa
abyss-index  --fai sample1-4.fa
cat sample1-3.fa.fai sample1-4.fa.fai \
    | MergePaths   -j1 -k90 -s1000  -o sample1-4.path2 - sample1-4.path1
PathOverlap --assemble   -k90  sample1-4.dot sample1-4.path2 >sample1-4.path3
cat sample1-3.fa sample1-4.fa \
    | abyss-stack-size 65536 PathConsensus  --dot -k90  -p0.9  -o sample1-5.path -s sample1-5.fa -g sample1-5.dot - sample1-4.dot sample1-4.path3
cat sample1-3.fa sample1-4.fa sample1-5.fa | MergeContigs   -k90 -o sample1-6.fa - sample1-5.dot sample1-5.path
ln -sf sample1-6.fa sample1-contigs.fa
PathOverlap --overlap   -k90 --dot sample1-5.dot sample1-5.path >sample1-6.dot
ln -sf sample1-6.dot sample1-contigs.dot
abyss-map   -j1 -l40    "".R1.fq.gz "".R2.fq.gz sample1-6.fa \
    |abyss-fixmate   -l40  -h sample1-6.hist \
    |sort -snk3 -k4 \
    |DistanceEst   --dot --median -j1 -k90  -l40 -s1000 -n10  -o sample1-6.dist.dot sample1-6.hist
abyss-scaffold    -k90 -s100-5000 -n15-20 -g sample1-6.path.dot  sample1-6.dot sample1-6.dist.dot >sample1-6.path
abyss-stack-size 65536 PathConsensus  --dot -k90  -p0.9  -s sample1-7.fa -g sample1-7.dot -o sample1-7.path sample1-6.fa sample1-6.dot sample1-6.path
cat sample1-6.fa sample1-7.fa \
    | MergeContigs   -k90 -o sample1-8.fa - sample1-7.dot sample1-7.path
ln -sf sample1-8.fa sample1-scaffolds.fa
PathOverlap --overlap   -k90 --dot sample1-7.dot sample1-7.path >sample1-8.dot
ln -sf sample1-8.dot sample1-scaffolds.dot
abyss-fac   sample1-unitigs.fa sample1-contigs.fa sample1-scaffolds.fa |tee sample1-stats.tab
ln -sf sample1-stats.tab sample1-stats
tr '\t' , <sample1-stats.tab >sample1-stats.csv
abyss-tabtomd sample1-stats.tab >sample1-stats.md
pandoc sample1-stats.md >sample1-stats.html
lcoombe commented 1 year ago

Hi @fpezzini,

I believe the issue is related to the use of the quotes above - you can see that the intended read file names aren't being run for the various ABySS steps.

Try:

while read f; do abyss-pe name=${f} k=90 B=2G in="${f}.R1.fq.gz ${f}.R2.fq.gz"; done < acc