Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
364 stars 81 forks source link

symlink inconsistent when running singularity with --workingdir='/data' #627

Closed splaisan closed 10 months ago

splaisan commented 1 year ago

Thanks for the great tool, I am almost able to run it now!

I ran the singularity with an extra --workingdir='/data' to avoid filling my small HOME folder When I look into the resulting folder, I noticed a broken symlink containing /data and pointing to GeneMark-EP/genemark.gtf

the link is as follows:

traingenes.gtf -> /data/GeneMark-EP/genemark.gtf

where it should be

traingenes.gtf -> GeneMark-EP/genemark.gtf

the /data/ was kept, probably due to a missing basename command to remove the 'workingdir'

Note that the wrongly linked file genemark.gtf is present in the GeneMark-EP folder

run command:

singularity exec   \
  -B ${PWD}:/data   \
  -B ${ORTHODB}:/orthodb   \
  ${SIFFILES}/braker3.sif braker.pl  \
  --species="lama_glama"   \
  --genome="/data/CM052816.1.fa"   \
  --prot_seq="/orthodb/Vertebrata.fa"   \
  --workingdir="/data"   \
  --threads="${thr}"

somewhere in the stdout

Called from: /data
Cmd: /opt/ETP/bin/gmes/ProtHint/bin/prothint.py --threads=24 --geneMarkGtf /data/GeneMark-ES/genemark.gtf /data/genome.fa /data/proteins.fa

resulting folder structure

drwxr-xr-x 9 u0002316 domain users 4.0K May  8 11:46 .
drwxr-xr-x 3 u0002316 domain users 4.0K May  8 11:26 ..
-rw-r--r-- 1 u0002316 domain users  112 May  8 11:43 aa2nonred.stdout
-rw-r--r-- 1 u0002316 domain users  19K May  8 11:46 braker.log
-rw-r--r-- 1 u0002316 domain users  99M May  8 10:34 CM052816.1.fa
-rw-r--r-- 1 u0002316 domain users   30 May  8 10:35 CM052816.1.fa.fai
-rw-r--r-- 1 u0002316 domain users  160 May  8 11:10 cmd.log
drwxr-xr-x 2 u0002316 domain users 4.0K May  8 11:13 diamond
-rw-r--r-- 1 u0002316 domain users    0 May  8 11:43 downsample_traingenes.log
-rw-r--r-- 1 u0002316 domain users    0 May  8 11:43 ensure_min_n_training_genes.stdout
drwxr-xr-x 2 u0002316 domain users 4.0K May  8 11:46 errors
-rw-r--r-- 1 u0002316 domain users    0 May  8 11:43 etrain.bad.lst
-rw-r--r-- 1 u0002316 domain users 170K May  8 11:16 evidence.gff
-rw-r--r-- 1 u0002316 domain users  369 May  8 11:43 filterGenemark.stdout
-rw-r--r-- 1 u0002316 domain users 1.5K May  8 11:43 firstetraining.stdout
-rw-r--r-- 1 u0002316 domain users 1.5M May  8 11:46 firsttest.stdout
-rw-r--r-- 1 u0002316 domain users 1.5K May  8 11:43 gbFilterEtraining.stdout
-rw-r--r-- 1 u0002316 domain users   72 May  8 10:46 gc_content.out
drwxr-xr-x 6 u0002316 domain users 4.0K May  8 11:43 GeneMark-EP        #####
-rw-r--r-- 1 u0002316 domain users 5.5K May  8 11:43 GeneMark-EP.stdout
drwxr-xr-x 6 u0002316 domain users 4.0K May  8 11:10 GeneMark-ES
-rw-r--r-- 1 u0002316 domain users 5.1K May  8 11:10 GeneMark-ES.stdout
-rw-r--r-- 1 u0002316 domain users 123K May  8 11:16 genemark_evidence.gff
-rw-r--r-- 1 u0002316 domain users 211K May  8 11:16 genemark_hintsfile.gff
-rw-r--r-- 1 u0002316 domain users 429K May  8 11:11 gene_stat.yaml
-rw-r--r-- 1 u0002316 domain users  99M May  8 10:46 genome.fa
-rw-r--r-- 1 u0002316 domain users   22 May  8 10:46 genome_header.map
-rw-r--r-- 1 u0002316 domain users 2.2M May  8 11:43 good_genes.lst
-rw-r--r-- 1 u0002316 domain users 691K May  8 11:16 hintsfile.gff
-rw-r--r-- 1 u0002316 domain users  33K May  8 11:43 nonred.loci.lst
-rw-r--r-- 1 u0002316 domain users 8.6M May  8 11:14 nuc.fasta
-rw-r--r-- 1 u0002316 domain users  18K May  8 12:13 optimize_augustus.stdout
-rw-r--r-- 1 u0002316 domain users 5.5G May  8 10:47 proteins.fa
-rw-r--r-- 1 u0002316 domain users 691K May  8 11:16 prothint_augustus.gff
-rw-r--r-- 1 u0002316 domain users 227K May  8 11:16 prothint.gff
-rw-r--r-- 1 u0002316 domain users 2.1M May  8 11:11 seed_proteins.faa
drwxr-xr-x 2 u0002316 domain users 440K May  8 11:16 Spaln
drwxr-xr-x 2 u0002316 domain users 4.0K May  8 10:46 species
drwxr-xr-x 2 u0002316 domain users 4.0K May  8 11:46 tmp_opt_lama_glama
-rw-r--r-- 1 u0002316 domain users 367K May  8 11:16 top_chains.gff
-rw-r--r-- 1 u0002316 domain users  20M May  8 11:43 train.ff.gb
-rw-r--r-- 1 u0002316 domain users  20M May  8 11:43 train.f.gb
-rw-r--r-- 1 u0002316 domain users  20M May  8 11:43 train.gb
-rw-r--r-- 1 u0002316 domain users 5.1M May  8 11:43 train.gb.test
-rw-r--r-- 1 u0002316 domain users  15M May  8 11:43 train.gb.train
-rw-r--r-- 1 u0002316 domain users 5.0M May  8 11:43 train.gb.train.test
-rw-r--r-- 1 u0002316 domain users 9.2M May  8 11:43 train.gb.train.train
-rw-r--r-- 1 u0002316 domain users 428K May  8 11:43 traingenes.good.fa
-rw-r--r-- 1 u0002316 domain users 2.0M May  8 11:43 traingenes.good.gtf
-rw-r--r-- 1 u0002316 domain users 427K May  8 11:43 traingenes.good.nr.fa
lrwxrwxrwx 1 u0002316 domain users   30 May  8 11:43 traingenes.gtf -> /data/GeneMark-EP/genemark.gtf     #######
-rw-r--r-- 1 u0002316 domain users 1.7K May  8 10:47 what-to-cite.txt
## note, the run is not finished yet
KatharinaHoff commented 10 months ago

I am addressing this in commit https://github.com/Gaius-Augustus/BRAKER/commit/4c36c137f5201d85b851d490c525c7f1d5704388

Thank you for reporting it.