Closed ChuanzhengWei closed 11 months ago
What is inside your protein file? Is it a an OrthoDB partition? You have to provide a protein file with a large degree of redundancy in protein space, i.e. the proteins should come from one species, only.
On Thu, Dec 7, 2023 at 4:03 AM ChuanzhengWei @.***> wrote:
Dear Braker team, I got an error when using brake3 built by Singularity for gene prediction. I am not sure whether there is a problem with gmetp.pl during the running process. My input file is a protein file and a bam file aligned with hisat2. This is my input command:
singularity exec /public/home/weichuanzheng/software/singularity/braker3/braker3.sif braker.pl --JAVA_PATH=/public/home/weichuanzheng/software/jdk/bin --threads=8 --species=s349 \ --genome=/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/s349.nextpolish.fasta.masked \ --prot_seq=/public/home/weichuanzheng/project/11.Sorghum_genome/06.prot/structure_annotation1.fasta \ --bam==/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/bamfile/SRR23260553.sorted.bam,............
The following is the specific content of the error report: In 'braker.log':
Wed Dec 6 10:37:36 2023: sorting RNA-Seq BAM files
Wed Dec 6 12:42:05 2023: Running gmetp.pl
/usr/bin/perl /opt/ETP/bin/gmetp.pl --cfg /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP/etp_config.yaml --workdir /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP --bam /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP/etp_data/ --cores 8 --softmask 1>/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/errors/GeneMark-ETP.stdout 2>/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/errors/GeneMark-ETP.stderr
At the end of 'GeneMark-ETP.stderr':
WARNING: 'ptg000031l_np1212' does not match any sequence in the fasta file. Maybe the two files do not belong together. error error, file/folder not found: transcripts_merged.fasta.gff
In 'GeneMark-ETP.stdout':
GeneMarkS: error on last system call, error code 256 Abort program!!!
I would appreciate any suggestions.
— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JAWJTG4ITNTJ3MUFITYIEWW5AVCNFSM6AAAAABAKLQJLOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDSNZYGY3TINY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
My protein file includes sequences from 60 varieties of sorghum, one variety of rice, and one variety of maize. Initially, I faced a problem that did not seem to stem from the protein file itself. After renaming and shortening the headers of the sequences in the genome file, I successfully generated the braker.gff file.
However, I've encountered a new challenge: the generated GFF file does not contain UTRs (Untranslated Regions). I think this issue might be related to the limitations of the container environment, as I am running BRAKER through Singularity due to the lack of root privileges on my system.
Given these constraints, could you please advise on how I might obtain a GFF file that includes UTRs? Any guidance or suggestions you can offer would be greatly appreciated, as this is a critical component of my project.
Thank you in advance for your time and assistance. I look forward to your valuable input.
Best regards
This is the version I'm using singularity exec braker3.sif braker.pl --version braker.pl version 3.0.3
I did not find GeneMark-ETP/rnaseq/stringtie/transcripts_merged.gff, so I need to reuse stringtie to obtain a new gff file, and then merge the stringtie.gff and braker.gtf files through stringtie2utr.py?
See #587
Yes, you need to run stringtie. The script is not connected to BRAKER, yet.
thank you, I successfully obtained a GTF file containing UTRs using stringtie2utr.py, but I've encountered a new issue: there are multiple pieces of information generated for the 5' UTR or 3' UTR of the same gene.like this
178 Chr01 stringtie2utr five_prime_UTR 36899 36899 1000 - . transcript_id "g4.t2"; gene_id "g4";
179 Chr01 stringtie2utr five_prime_UTR 37358 37440 1000 - . transcript_id "g4.t2"; gene_id "g4";
180 Chr01 stringtie2utr five_prime_UTR 41705 41825 1000 - . transcript_id "g4.t2"; gene_id "g4";
181 Chr01 stringtie2utr five_prime_UTR 42029 42456 1000 - . transcript_id "g4.t2"; gene_id "g4"
I want to know if this situation is normal.
This is not necessarily wrong. UTRs can be spliced
ChuanzhengWei @.***> schrieb am Mi. 13. Dez. 2023 um 09:57:
thank you, I successfully obtained a GTF file containing UTRs using stringtie2utr.py, but I've encountered a new issue: there are multiple pieces of information generated for the 5' UTR or 3' UTR of the same gene.like this
178 Chr01 stringtie2utr five_prime_UTR 36899 36899 1000 - . transcript_id "g4.t2"; gene_id "g4"; 179 Chr01 stringtie2utr five_prime_UTR 37358 37440 1000 - . transcript_id "g4.t2"; gene_id "g4"; 180 Chr01 stringtie2utr five_prime_UTR 41705 41825 1000 - . transcript_id "g4.t2"; gene_id "g4"; 181 Chr01 stringtie2utr five_prime_UTR 42029 42456 1000 - . transcript_id "g4.t2"; gene_id "g4"
I want to know if this situation is normal.
— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/716#issuecomment-1853508675, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JAHSMAYQ2DG6BLW37TYJFUW5AVCNFSM6AAAAABAKLQJLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTGUYDQNRXGU . You are receiving this because you modified the open/close state.Message ID: @.***>
Dear Braker team, I got an error when using brake3 built by Singularity for gene prediction. I am not sure whether there is a problem with gmetp.pl during the running process. My input file is a protein file and a bam file aligned with hisat2. This is my input command:
The following is the specific content of the error report: In 'braker.log':
At the end of 'GeneMark-ETP.stderr':
In 'GeneMark-ETP.stdout':
I would appreciate any suggestions.