BrendelGroup / AEGeAn

Integrated toolkit for analysis and evaluation of annotated genomes
http://brendelgroup.github.io/AEGeAn
ISC License
24 stars 10 forks source link

local build failure using singularity #266

Open splaisan opened 9 months ago

splaisan commented 9 months ago

HI,

I am back trying to build my local database using my own braker3_gdna, name-fixed braker3_gff3 and braker3_protein.fa.

The singularity command does not create the label folder and then complains about not finding files it should have added to it Can you please help me solve this. Thanks

# the wget downloaded sif file is at ${NXF_SINGULARITY_CACHEDIR}/aegean.sif

workdir="$PWD"
cd "${workdir}"

# compare to ont braker results
myasm="/workdir/ont_draft_assembly_softmask.fasta"
myannot="/workdir/braker_fixed.gff3"
myprot="/workdir/braker.aa.fa"
label="Gnm1"
refr="Crei"

# test command
# singularity exec -e -B $(pwd) ${NXF_SINGULARITY_CACHEDIR}/aegean.sif fidibus -h

pfx_cmd="singularity exec -e -B ${workdir}:/workdir ${NXF_SINGULARITY_CACHEDIR}/aegean.sif "

# download reference data
#addcmd=" --relax "
#${pfx_cmd} fidibus --workdir=${workdir} --refr=${refr} ${addcmd} download prep iloci breakdown stats

# prep local data
${pfx_cmd} fidibus --workdir=${workdir} --local --gdna=${myasm} --gff3=${myannot} --prot=${myprot} --label=${label} prep iloci

the output

${pfx_cmd} fidibus --workdir=${workdir} --local --gdna=${myasm} --gff3=${myannot} --prot=${myprot} --label=${label} prep iloci
[Genome: Gnm1] preprocess genome sequence file
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.9/site-packages/LocusPocus-0.15.1+216.g4b00f0a-py3.9.egg/EGG-INFO/scripts/fidibus", line 41, in run_build
    db.prep(strict=not args.relax)
  File "/usr/local/lib/python3.9/site-packages/LocusPocus-0.15.1+216.g4b00f0a-py3.9.egg/LocusPocus/genomedb.py", line 225, in prep
    self.preprocess_gdna(logstream=logstream, verify=verify, strict=strict)
  File "/usr/local/lib/python3.9/site-packages/LocusPocus-0.15.1+216.g4b00f0a-py3.9.egg/LocusPocus/genomedb.py", line 301, in preprocess_gdna
    self.preprocess('gdna', logstream, verify, strict)
  File "/usr/local/lib/python3.9/site-packages/LocusPocus-0.15.1+216.g4b00f0a-py3.9.egg/LocusPocus/genomedb.py", line 262, in preprocess
    outstream = open(outfile, 'w')
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/syn_lts/analyses/genome_assembly/euk_asm/Chlamydomonas_reinhardtii/annotation_results/ont_results/aegean_results/Gnm1/Gnm1.gdna.fa'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/fidibus", line 4, in <module>
    __import__('pkg_resources').run_script('LocusPocus==0.15.1+216.g4b00f0a', 'fidibus')
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 651, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 1448, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.9/site-packages/LocusPocus-0.15.1+216.g4b00f0a-py3.9.egg/EGG-INFO/scripts/fidibus", line 240, in <module>
    main(get_parser().parse_args())
  File "/usr/local/lib/python3.9/site-packages/LocusPocus-0.15.1+216.g4b00f0a-py3.9.egg/EGG-INFO/scripts/fidibus", line 222, in main
    _ = [p.get() for p in results]
  File "/usr/local/lib/python3.9/site-packages/LocusPocus-0.15.1+216.g4b00f0a-py3.9.egg/EGG-INFO/scripts/fidibus", line 222, in <listcomp>
    _ = [p.get() for p in results]
  File "/usr/lib64/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/syn_lts/analyses/genome_assembly/euk_asm/Chlamydomonas_reinhardtii/annotation_results/ont_results/aegean_results/Gnm1/Gnm1.gdna.fa'

my local files seem OK though

-rw-r--r-- 1 u0002316 domain users  12M Jan 11 16:19 braker.aa.fa
-rw-r--r-- 1 u0002316 domain users  43M Jan 15 09:33 braker_fixed.gff3
-rw-r--r-- 1 u0002316 domain users 110M Jan 11 16:19 ont_draft_assembly_softmask.fasta
-rw-r--r-- 1 u0002316 domain users 1.9K Jan 11 16:19 ont_draft_assembly_softmask.fasta.fai

I have the feeling this is a stupid path issue but do not succeed to fix it

thanks for your help

splaisan commented 9 months ago

CORRECTION

I edited workdir to be a static path and it worked with numerous error messages linked to some of the input entries, but it ran until the end.

I also found that in order to get the Gnm1 folder created, I needed to add 'download' to the local command, this is not mentioned in the example commands of the doc: (https://github.com/standage/genhub/blob/master/docs/MANUAL.md)

workdir="$PWD"
cd "${workdir}"

# compare to ont braker results
myasm="/workdir/ont_draft_assembly_softmask.fasta"
myannot="/workdir/braker_fixed.gff3"
myprot="/workdir/braker.aa.fa"
label="Gnm1"
refr="Crei"

# test command
# singularity exec -e -B $(pwd) ${NXF_SINGULARITY_CACHEDIR}/aegean.sif fidibus -h

pfx_cmd="singularity exec -e -B ${workdir}:/workdir ${NXF_SINGULARITY_CACHEDIR}/aegean.sif "

# download reference data
#addcmd=" --relax "
#${pfx_cmd} fidibus --workdir=${workdir} --refr=${refr} ${addcmd} download prep iloci breakdown stats

# prep local data
# adding 'download' to get it to work
${pfx_cmd} fidibus --workdir=/workdir --local --gdna=${myasm} --gff3=${myannot} --prot=${myprot} --label=${label} download prep iloci

# compare to ont braker results
addcmd=" --relax "
${pfx_cmd}  fidibus --workdir=/workdir --numprocs=24 --local \
  --gdna=${myasm} \
  --gff3=${myannot} \
  --prot=${myprot} \
  --label=${label} \
  --refr=${refr} \
  ${addcmd} \
  download prep iloci breakdown stats

I now have a Gnm1 folder with the following files

total 964M
drwxr-xr-x 2 u0002316 domain users 4.0K Jan 15 10:36 .
drwxr-xr-x 7 u0002316 domain users 4.0K Jan 15 10:39 ..
-rw-r--r-- 1 u0002316 domain users  41M Jan 15 10:36 Gnm1.all.cds.fa
-rw-r--r-- 1 u0002316 domain users  36M Jan 15 10:36 Gnm1.all.mrnas.fa
-rw-r--r-- 1 u0002316 domain users  12M Jan 15 10:36 Gnm1.all.mrnas.gff3
-rw-r--r-- 1 u0002316 domain users  76M Jan 15 10:36 Gnm1.all.pre-mrnas.fa
-rw-r--r-- 1 u0002316 domain users  12M Jan 15 10:35 Gnm1.all.prot.fa
-rw-r--r-- 1 u0002316 domain users  34M Jan 15 10:36 Gnm1.cds.fa
-rw-r--r-- 1 u0002316 domain users 693K Jan 15 10:37 Gnm1.cds.tsv
-rw-r--r-- 1 u0002316 domain users  38M Jan 15 10:36 Gnm1.exons.fa
-rw-r--r-- 1 u0002316 domain users  11M Jan 15 10:37 Gnm1.exons.tsv
-rw-r--r-- 1 u0002316 domain users 460K Jan 15 10:36 Gnm1.filens.tsv
-rw-r--r-- 1 u0002316 domain users 110M Jan 15 10:35 Gnm1.gdna.fa
-rw-r--r-- 1 u0002316 domain users  43M Jan 15 10:35 Gnm1.gff3
-rw-r--r-- 1 u0002316 domain users 393K Jan 15 10:36 Gnm1.ilens.tsv
-rw-r--r-- 1 u0002316 domain users 115M Jan 15 10:36 Gnm1.iloci.fa
-rw-r--r-- 1 u0002316 domain users  48M Jan 15 10:36 Gnm1.iloci.gff3
-rw-r--r-- 1 u0002316 domain users 2.6M Jan 15 10:37 Gnm1.iloci.tsv
-rw-r--r-- 1 u0002316 domain users  38M Jan 15 10:36 Gnm1.ilocus.mrnas.gff3
-rw-r--r-- 1 u0002316 domain users  11M Jan 15 10:36 Gnm1.ilocus.mrnas.temp
-rw-r--r-- 1 u0002316 domain users 436K Jan 15 10:36 Gnm1.ilocus.mrnas.tsv
-rw-r--r-- 1 u0002316 domain users  41M Jan 15 10:36 Gnm1.introns.fa
-rw-r--r-- 1 u0002316 domain users 8.2M Jan 15 10:37 Gnm1.introns.tsv
-rw-r--r-- 1 u0002316 domain users 110M Jan 15 10:36 Gnm1.miloci.fa
-rw-r--r-- 1 u0002316 domain users 2.7M Jan 15 10:36 Gnm1.miloci.gff3
-rw-r--r-- 1 u0002316 domain users 1.8M Jan 15 10:37 Gnm1.miloci.tsv
-rw-r--r-- 1 u0002316 domain users  36M Jan 15 10:36 Gnm1.mrnas.fa
-rw-r--r-- 1 u0002316 domain users  11M Jan 15 10:36 Gnm1.mrnas.gff3
-rw-r--r-- 1 u0002316 domain users  12M Jan 15 10:36 Gnm1.mrnas.temp
-rw-r--r-- 1 u0002316 domain users 669K Jan 15 10:37 Gnm1.mrnas.tsv
-rw-r--r-- 1 u0002316 domain users 176K Jan 15 10:36 Gnm1.mrnas.txt
-rw-r--r-- 1 u0002316 domain users  76M Jan 15 10:36 Gnm1.pre-mrnas.fa
-rw-r--r-- 1 u0002316 domain users 838K Jan 15 10:37 Gnm1.pre-mrnas.tsv
-rw-r--r-- 1 u0002316 domain users 470K Jan 15 10:36 Gnm1.protein2ilocus.repr.tsv
-rw-r--r-- 1 u0002316 domain users 473K Jan 15 10:36 Gnm1.protein2ilocus.tsv
-rw-r--r-- 1 u0002316 domain users    0 Jan 15 10:36 Gnm1.prot.fa
-rw-r--r-- 1 u0002316 domain users 176K Jan 15 10:36 Gnm1.protids.txt
-rw-r--r-- 1 u0002316 domain users 259K Jan 15 10:36 Gnm1.simple-iloci.txt
-rw-r--r-- 1 u0002316 domain users  40M Jan 15 10:36 Gnm1.with-introns.gff3
-rw-r--r-- 1 u0002316 domain users 300K Jan 15 10:35 ilens.temp

What can I do with it to evaluate my genome assembly and annotations in comparison to the public reference? I looked into the doc but do not find a tuto to explore the results, only the list of TSV files and suggestion to analyze them in R. I do not find info linking to the reference data 'Crei' in these outputs, only my own annotation info, decorated with multiple metrics, did I mis a step?

thanks for your help

vpbrendel commented 9 months ago

The -B flag to singularity binds host system folders for use in the command. See apptainer documentation. The file starting with /mnt/syn_lts not found error is likely due to that directory not being in the list of folders bound to the container.

Re need for download, yes, that is needed to set up the data in the working directory.

For usage, please see https://github.com/BrendelGroup/iLoci_SLB22NARGB and https://doi.org/10.1093/nargab/lqac013 Volker