NBISweden / GAAS

Genome Assembly and Annotation Service code
GNU General Public License v3.0
199 stars 46 forks source link

File copy warning with gaas_maker_merge_outputs_from_datastore.pl #46

Closed nylander closed 3 years ago

nylander commented 3 years ago

The following warning messages where observed:

$ ~/git/NBIS/GAAS/annotation/tools/maker/gaas_maker_merge_outputs_from_datastore.pl \
   -i genome.maker.output_mixabinitio_abinitio_pacbio/ \
   -o genome.maker.output_mixabinitio_abinitio_pacbio_output_processed

[...]
Now save a copy of the Maker option files ...
Copy failed: No such file or directory genome.maker.output_mixabinitio_abinitio_pacbio_output_processed/maker_opts.ctl
Copy failed: No such file or directory genome.maker.output_mixabinitio_abinitio_pacbio_output_processed/maker_exe.ctl
Copy failed: No such file or directory genome.maker.output_mixabinitio_abinitio_pacbio_output_processed/maker_evm.ctl
Copy failed: No such file or directory genome.maker.output_mixabinitio_abinitio_pacbio_output_processed/maker_bopts.ctl

Now protecting the maker_annotation.gff annotation by making it readable only...

Now performing the statistics of the annotation file genome.maker.output_mixabinitio_abinitio_pacbio_output_processed/maker_ann
otation.gff...
WARNING get_longest_cds_level2: NO exon or cds to select the longest l2 for evm-000115f-processed-gene-1.0 l1 ! We will take on
e randomly ! @

There are possibly two kinds of errors observed here. First is the failure of copying control files. This is addressed in the pull request (#47 ).

The second is the warning from get_longest_cds_level2. This have not yet been addressed.

One issue related to the error with paths and folders is that the script searches for output folders from Maker ending in maker.output (line #59), but the case I was given have folders ending in something else.

Juke34 commented 3 years ago

The statistics are made by AGAT, so

WARNING get_longest_cds_level2: NO exon or cds to select the longest l2 for evm-000115f-processed-gene-1.0 l1 ! We will take on
e randomly ! @

message comes from AGAT. It compute twice the statistics. Once will all isoforms, a second time keeping only longest isoforms. To check what is the longest isoform we look first at CDS, if no CDS are present (non coding gene) we look at exon features, if none then we through a Warning because it is not normal. The message means there are records that are not gene models (probably the mixabinitio added are just part/march_part features). There is nothing to do excepted to correct the annotation file.

nylander commented 3 years ago

Closing this issue after having addressed the GAAS-part of the observed warnings in pull request #47.