NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
42 stars 18 forks source link

Test data for Abinitio training insufficient. #40

Closed mahesh-panchal closed 4 years ago

mahesh-panchal commented 4 years ago

Trying to use the test profile for the Abinitio workflow resulted in:

Error executing process > 'abinitio_training:gbk2augustus (Make Augustus training set: codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered)'

Caused by:
  Missing output file(s) `codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered.gbk.train` expected by process `abinitio_training:gbk2augustus (Make Augustus training set: codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered)`

Command executed:

  randomSplit.pl codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered.gbk 10

Command exit status:
  0

Command output:
  size 10 is greater than the number of genes in file
  codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered.gbk. Aborting.

Command error:
  WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container

workflow was run with:

#! /usr/bin/env bash

NXF_SCRIPT=/proj/snic2019-8-350/pipelines-nextflow/AbinitioTraining/AbinitioTraining.nf
nextflow run -profile nbis,singularity,test $NXF_SCRIPT -process.clusterOptions '-A snic2019-8-350'
Juke34 commented 4 years ago

If we change the value of the locus distance parameter it should be fine, could we set special values for the test?

mahesh-panchal commented 4 years ago

Yes, what should the value be? I can make a quick patch.

Juke34 commented 4 years ago

I think it is 3000 now, when it was 1000 the test was fine.

mahesh-panchal commented 4 years ago

Thanks. I was wondering what had changed that it didn't work any more. Making a patch for it.

mahesh-panchal commented 4 years ago

I'm still getting the same error testing this.

mahesh-panchal commented 4 years ago

That's the branch: https://github.com/NBISweden/pipelines-nextflow/tree/Update_abinitio_test

Juke34 commented 4 years ago

was actually 500bp before.

Brent-Saylor-Canopy commented 1 year ago

I'm having this issue on a ~800mb genome for which maker found over 35k gene gene models.

This is the error

executor >  local (12)
[29/c3785c] process > ABINITIO_TRAINING:SPLIT_MAKER_EVIDENCE (CGC14_rnd1.all.maker.chrOnly)                                                [100%] 1 of 1 ✔
[ba/79b60e] process > ABINITIO_TRAINING:MODEL_SELECTION_BY_AED (mrna)                                                                      [100%] 1 of 1 ✔
[89/6c7d82] process > ABINITIO_TRAINING:RETAIN_LONGEST_ISOFORM (codingGeneFeatures.filter)                                                 [100%] 1 of 1 ✔
[06/7be81e] process > ABINITIO_TRAINING:REMOVE_INCOMPLETE_GENE_MODELS (codingGeneFeatures.filter.longest_cds)                              [100%] 1 of 1 ✔
[1e/cd182d] process > ABINITIO_TRAINING:FILTER_BY_LOCUS_DISTANCE (codingGeneFeatures.filter.longest_cds.complete)                          [100%] 1 of 1 ✔
[b2/bc82ee] process > ABINITIO_TRAINING:EXTRACT_PROTEIN_SEQUENCE (codingGeneFeatures.filter.longest_cds.complete.good_distance)            [100%] 1 of 1 ✔
[cf/dbe649] process > ABINITIO_TRAINING:BLAST_MAKEBLASTDB (codingGeneFeatures.filter.longest_cds.complete.good_distance_proteins.fasta)    [100%] 1 of 1 ✔
[cc/aa5422] process > ABINITIO_TRAINING:BLAST_RECURSIVE (codingGeneFeatures.filter.longest_cds.complete.good_distance_proteins)            [100%] 1 of 1 ✔
[a8/5f139c] process > ABINITIO_TRAINING:GFF_FILTER_BY_BLAST (codingGeneFeatures.filter.longest_cds.complete.good_distance)                 [100%] 1 of 1 ✔
[b5/2b6951] process > ABINITIO_TRAINING:GFF2GBK (codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered)              [100%] 1 of 1 ✔
[1d/c98c87] process > ABINITIO_TRAINING:GBK2AUGUSTUS (codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered)         [  0%] 0 of 1
[-        ] process > ABINITIO_TRAINING:AUGUSTUS_TRAINING                                                                                  -
[01/0d2ae0] process > ABINITIO_TRAINING:CONVERT_GFF2ZFF (codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered.gff3) [  0%] 0 of 1
[-        ] process > ABINITIO_TRAINING:SNAP_TRAINING                                                                                      -
Error executing process > 'ABINITIO_TRAINING:GBK2AUGUSTUS (codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered)'

Caused by:
  Missing output file(s) `codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered.gbk.train` expected by process `ABINITIO_TRAINING:GBK2AUGUSTUS (codin
gGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered)`

Command executed:

  randomSplit.pl codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered.gbk 100

  cat <<-END_VERSIONS > versions.yml
  "ABINITIO_TRAINING:GBK2AUGUSTUS":
      augustus: $( augustus | sed '1!d; s/.*(//; s/).*//' )
  END_VERSIONS

Command exit status:
  0

Command output:
  size 100 is greater than the number of genes in file
  codingGeneFeatures.filter.longest_cds.complete.good_distance_blast-filtered.gbk. Aborting.

Work dir:
  /data/Maker_annotation/NBISweden_CGC14_rnd1/work/1d/c98c87d1fcadbe4c172086eb004108

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

        The workflow completed unsuccessfully.
        Please read over the error message. If you are unable to solve it, please
        post an issue at https://github.com/NBISweden/pipelines-nextflow/issues
        where we will do our best to help.

WARN: Killing running tasks (1)

Has I'm not sure how to track down this error. I'd appreciate any ideas that might be able to help