Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
364 stars 81 forks source link

Problems in RUNNING GENEMARK-EX #716

Closed ChuanzhengWei closed 11 months ago

ChuanzhengWei commented 11 months ago

Dear Braker team, I got an error when using brake3 built by Singularity for gene prediction. I am not sure whether there is a problem with gmetp.pl during the running process. My input file is a protein file and a bam file aligned with hisat2. This is my input command:

singularity exec /public/home/weichuanzheng/software/singularity/braker3/braker3.sif braker.pl --JAVA_PATH=/public/home/weichuanzheng/software/jdk/bin --threads=8 --species=s349 \
    --genome=/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/s349.nextpolish.fasta.masked \
    --prot_seq=/public/home/weichuanzheng/project/11.Sorghum_genome/06.prot/structure_annotation1.fasta \
    --bam==/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/bamfile/SRR23260553.sorted.bam,............

The following is the specific content of the error report: In 'braker.log':

# Wed Dec  6 10:37:36 2023: sorting RNA-Seq BAM files
# Wed Dec  6 12:42:05 2023: Running gmetp.pl
/usr/bin/perl /opt/ETP/bin/gmetp.pl --cfg /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP/etp_config.yaml --workdir /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP --bam /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP/etp_data/ --cores 8 --softmask  1>/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/errors/GeneMark-ETP.stdout 2>/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/errors/GeneMark-ETP.stderr

At the end of 'GeneMark-ETP.stderr':

WARNING: 'ptg000031l_np1212' does not match any sequence in the fasta file. Maybe the two files do not belong together.
error
error, file/folder not found: transcripts_merged.fasta.gff

In 'GeneMark-ETP.stdout':

GeneMarkS: error on last system call, error code 256
Abort program!!!

I would appreciate any suggestions.

KatharinaHoff commented 11 months ago

What is inside your protein file? Is it a an OrthoDB partition? You have to provide a protein file with a large degree of redundancy in protein space, i.e. the proteins should come from one species, only.

On Thu, Dec 7, 2023 at 4:03 AM ChuanzhengWei @.***> wrote:

Dear Braker team, I got an error when using brake3 built by Singularity for gene prediction. I am not sure whether there is a problem with gmetp.pl during the running process. My input file is a protein file and a bam file aligned with hisat2. This is my input command:

singularity exec /public/home/weichuanzheng/software/singularity/braker3/braker3.sif braker.pl --JAVA_PATH=/public/home/weichuanzheng/software/jdk/bin --threads=8 --species=s349 \ --genome=/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/s349.nextpolish.fasta.masked \ --prot_seq=/public/home/weichuanzheng/project/11.Sorghum_genome/06.prot/structure_annotation1.fasta \ --bam==/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/bamfile/SRR23260553.sorted.bam,............

The following is the specific content of the error report: In 'braker.log':

Wed Dec 6 10:37:36 2023: sorting RNA-Seq BAM files

Wed Dec 6 12:42:05 2023: Running gmetp.pl

/usr/bin/perl /opt/ETP/bin/gmetp.pl --cfg /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP/etp_config.yaml --workdir /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP --bam /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP/etp_data/ --cores 8 --softmask 1>/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/errors/GeneMark-ETP.stdout 2>/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/errors/GeneMark-ETP.stderr

At the end of 'GeneMark-ETP.stderr':

WARNING: 'ptg000031l_np1212' does not match any sequence in the fasta file. Maybe the two files do not belong together. error error, file/folder not found: transcripts_merged.fasta.gff

In 'GeneMark-ETP.stdout':

GeneMarkS: error on last system call, error code 256 Abort program!!!

I would appreciate any suggestions.

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JAWJTG4ITNTJ3MUFITYIEWW5AVCNFSM6AAAAABAKLQJLOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDSNZYGY3TINY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ChuanzhengWei commented 11 months ago

My protein file includes sequences from 60 varieties of sorghum, one variety of rice, and one variety of maize. Initially, I faced a problem that did not seem to stem from the protein file itself. After renaming and shortening the headers of the sequences in the genome file, I successfully generated the braker.gff file.

However, I've encountered a new challenge: the generated GFF file does not contain UTRs (Untranslated Regions). I think this issue might be related to the limitations of the container environment, as I am running BRAKER through Singularity due to the lack of root privileges on my system.

Given these constraints, could you please advise on how I might obtain a GFF file that includes UTRs? Any guidance or suggestions you can offer would be greatly appreciated, as this is a critical component of my project.

Thank you in advance for your time and assistance. I look forward to your valuable input.

Best regards

ChuanzhengWei commented 11 months ago

This is the version I'm using singularity exec braker3.sif braker.pl --version braker.pl version 3.0.3

KatharinaHoff commented 11 months ago

See https://github.com/Gaius-Augustus/BRAKER/issues/587

ChuanzhengWei commented 11 months ago

I did not find GeneMark-ETP/rnaseq/stringtie/transcripts_merged.gff, so I need to reuse stringtie to obtain a new gff file, and then merge the stringtie.gff and braker.gtf files through stringtie2utr.py?

See #587

KatharinaHoff commented 11 months ago

Yes, you need to run stringtie. The script is not connected to BRAKER, yet.

ChuanzhengWei commented 11 months ago

thank you, I successfully obtained a GTF file containing UTRs using stringtie2utr.py, but I've encountered a new issue: there are multiple pieces of information generated for the 5' UTR or 3' UTR of the same gene.like this

    178 Chr01   stringtie2utr   five_prime_UTR  36899   36899   1000    -       .       transcript_id "g4.t2"; gene_id "g4";
    179 Chr01   stringtie2utr   five_prime_UTR  37358   37440   1000    -       .       transcript_id "g4.t2"; gene_id "g4";
    180 Chr01   stringtie2utr   five_prime_UTR  41705   41825   1000    -       .       transcript_id "g4.t2"; gene_id "g4";
    181 Chr01   stringtie2utr   five_prime_UTR  42029   42456   1000    -       .       transcript_id "g4.t2"; gene_id "g4"

I want to know if this situation is normal.

KatharinaHoff commented 11 months ago

This is not necessarily wrong. UTRs can be spliced

ChuanzhengWei @.***> schrieb am Mi. 13. Dez. 2023 um 09:57:

thank you, I successfully obtained a GTF file containing UTRs using stringtie2utr.py, but I've encountered a new issue: there are multiple pieces of information generated for the 5' UTR or 3' UTR of the same gene.like this

178 Chr01   stringtie2utr   five_prime_UTR  36899   36899   1000    -       .       transcript_id "g4.t2"; gene_id "g4";
179 Chr01   stringtie2utr   five_prime_UTR  37358   37440   1000    -       .       transcript_id "g4.t2"; gene_id "g4";
180 Chr01   stringtie2utr   five_prime_UTR  41705   41825   1000    -       .       transcript_id "g4.t2"; gene_id "g4";
181 Chr01   stringtie2utr   five_prime_UTR  42029   42456   1000    -       .       transcript_id "g4.t2"; gene_id "g4"

I want to know if this situation is normal.

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/716#issuecomment-1853508675, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JAHSMAYQ2DG6BLW37TYJFUW5AVCNFSM6AAAAABAKLQJLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTGUYDQNRXGU . You are receiving this because you modified the open/close state.Message ID: @.***>