Closed jyw-atgithub closed 6 months ago
I would not expect GALBA to accept any gff with an annotation as input. GALBA is supposed to produce a new annotation (in gtf or gff3 format), not accept it as input. We often do not screen for hashtag lines, indeed (but I do not see how that is relevant because you should not run GALBA with gff input from NCBI). You may provide additional hints to AUGUSTUS in gff format, however, you need to postprocess the NCBI annotation for that, either way, because AUGUSTUS hints format is kind of its own gff fromat. It expects exact types in the third column, specific features in the last column, and the hints file must be compatible with the extrinsic.cfg file. I'd advise against it if you do not know what you're doing. You can read on hints files for Augustus e.g. in https://math-inf.uni-greifswald.de/storages/uni-greifswald/fakultaet/mnf/mathinf/stanke/augustus_wrp.pdf (or in the Augustus tutorials).
The whitespaces warning can be avoided by removing the whitespaces from the fasta headers, beforehand, but it's not dangerous.
Accuracy will be higher if you use more protein donors. The warning that coverage will be ignored basically says that you are using a not optimal input. If you have no more, use no more and ignore it.
I would advise against the crf flag unless you already have the hmm training and know how to compare the results to make a decision. crf is not always better and increases training time a lot.
The matrix warning affects only a few proteins in your data set. It is probably safe to ignore.
Hello, I downloaded the Refseq annotations from NCBI (GCF_016920845.1) and used GALBA with Singularity. At the beginning, GALBA kept reporting an error as following.
After working a while, I found that GALBA does not accept any header lines or "#". Is this normal? I could not find the instruction in the manual.
Seconds, should I ignore or fix the following warnings?
The command was:
singularity exec /home/jenyuw/Software/galba.sif galba.pl --genome=${final_genome}/C01_final.fasta --species=Phytichthys_chirus --prot_seq=${ref}/GCF_016920845/protein.faa --hints=${ref}/GCF_016920845/genomic2.gff --workingdir=${annotation} --threads 30 --crf
The environment is: singularity-ce version 3.11.0-jammy, GALBA v1.0.11, Ubuntu 22.04.2 LTSThank you!