Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
348 stars 79 forks source link

No results from the BRAKER with lots of RNA-seq data #379

Closed Huangyizhong closed 3 years ago

Huangyizhong commented 3 years ago

Hi ,there The BRAKER is a good software to do the annotation of the genome! I have used is to annotate the new genome! But there are some problems. Would you help me? My codes are listed follows: ${braker}/braker.pl --genome=${genome}/DRC_softmasked.fa \ --prot_seq=${protein}/six_protein.fasta \ --hints=${RNA}/D17-abdominalfat.hints.gff,${RNA}/D17-backfat.hints.gff,${RNA}/D17-blood.hints.gff,${RNA}/D17-brain.hints.gff,${RNA}/D17-breast.hints.gff,${RNA}/D17-cecum.hints.gff,${RNA}/D17-duodenum.hints.gff,${RNA}/D17-endometrium.hints.gff,${RNA}/D17-heart.hints.gff,${RNA}/D17-hypophysis.hints.gff,${RNA}/D17-hypothalamus.hints.gff,${RNA}/D17-kidney.hints.gff,${RNA}/D17-liver.hints.gff,${RNA}/D17-longissimus.hints.gff,${RNA}/D17-lung.hints.gff,${RNA}/D17-ovary.hints.gff,${RNA}/D17-skin.hints.gff,${RNA}/D17-spinalcord.hints.gff,${RNA}/D17-spleen.hints.gff,${RNA}/D17-stomach.hints.gff,${RNA}/D17-thyroid.hints.gff,${RNA}/D192-blood.hints.gff,${RNA}/D192-brain.hints.gff,${RNA}/D192-cecum.hints.gff,${RNA}/D192-duodenum.hints.gff,${RNA}/D192-heart.hints.gff,${RNA}/D192-hypophysis.hints.gff,${RNA}/D192-hypothalamus.hints.gff,${RNA}/D192-kidney.hints.gff,${RNA}/D192-liver.hints.gff,${RNA}/D192-longissimus.hints.gff,${RNA}/D192-lung.hints.gff,${RNA}/D192-pancreas.hints.gff,${RNA}/D192-redtestis.hints.gff,${RNA}/D192-skin.hints.gff,${RNA}/D192-spinalcord.hints.gff,${RNA}/D192-spleen.hints.gff,${RNA}/D192-stomach.hints.gff \ --etpmode \ --cores 8 \ --softmasking \ --workingdir=$wd \ --PROTHINT_PATH=/home/goldenpigs217/softwares/ProtHint-2.6.0/bin I have two questions about it : 1、 I have tried the run the BRAKER with one chromosome and it works well. But when I runned the whole genome , there were no results and no error. There have some many augustus file appeared. Is there some problems in my codes ? The genome size is 2.5G , with 38 illumina data and 166M protein . 2、I also want to add the PB.data into the model , how can I do it ? I have runned the Isoseq3 to deal with the raw data and obtained the sample.collapsed.gff Any suggestions? Thanks so much! Sincerely Yizhong Huang

KatharinaHoff commented 3 years ago

How many sequences are in your genome fasta file?

To the second question: that's kind of a delicate problem. You can in principle add hints from PB data. Direct blat2hints from psl mapping of assembled PB data does sometimes not work very well. I recommend caution. You could e.g. run AUGUSTUS it with and without PB hints (the latter is already done by BRAKER) and check in a browser which gene set looks better.

Your genome is big, your library names make me guess that you might be dealing with a mammal. If that's the case, I also recommend a run with human parameters instead of re-training. Again: compare results, check what looks better. Human parameters of Augustus are of very high quality. BRAKER cannot fine tune some of the parameters that have been adjusted to mammals in the human parameter set of Augustus.

On Fri, May 21, 2021 at 5:15 PM Yizhong Huang @.***> wrote:

Hi ,there The BRAKER is a good software to do the annotation of the genome! I have used is to annotate the new genome! But there are some problems. Would you help me? My codes are listed follows: ${braker}/braker.pl --genome=${genome}/DRC_softmasked.fa --prot_seq=${protein}/six_protein.fasta --hints=${RNA}/D17-abdominalfat.hints.gff,${RNA}/D17-backfat.hints.gff,${RNA}/D17-blood.hints.gff,${RNA}/D17-brain.hints.gff,${RNA}/D17-breast.hints.gff,${RNA}/D17-cecum.hints.gff,${RNA}/D17-duodenum.hints.gff,${RNA}/D17-endometrium.hints.gff,${RNA}/D17-heart.hints.gff,${RNA}/D17-hypophysis.hints.gff,${RNA}/D17-hypothalamus.hints.gff,${RNA}/D17-kidney.hints.gff,${RNA}/D17-liver.hints.gff,${RNA}/D17-longissimus.hints.gff,${RNA}/D17-lung.hints.gff,${RNA}/D17-ovary.hints.gff,${RNA}/D17-skin.hints.gff,${RNA}/D17-spinalcord.hints.gff,${RNA}/D17-spleen.hints.gff,${RNA}/D17-stomach.hints.gff,${RNA}/D17-thyroid.hints.gff,${RNA}/D192-blood.hints.gff,${RNA}/D192-brain.hints.gff,${RNA}/D192-cecum.hints.gff,${RNA}/D192-duodenum.hints.gff,${RNA}/D192-heart.hints.gff,${RNA}/D192-hypophysis.hints.gff,${RNA}/D192-hypothalamus.hints.gff,${RNA}/D192-kidney.hints.gff,${RNA}/D192-liver.hints.gff,${RNA}/D192-longissimus.hints.gff,${RNA}/D192-lung.hints.gff,${RNA}/D192-pancreas.hints.gff,${RNA}/D192-redtestis.hints.gff,${RNA}/D192-skin.hints.gff,${RNA}/D192-spinalcord.hints.gff,${RNA}/D192-spleen.hints.gff,${RNA}/D192-stomach.hints.gff

--etpmode --cores 8 --softmasking --workingdir=$wd --PROTHINT_PATH=/home/goldenpigs217/softwares/ProtHint-2.6.0/bin I have two questions about it : 1、 I have tried the run the BRAKER with one chromosome and it works well. But when I runned the whole genome , there were no results and no error. There have some many augustus file appeared. Is there some problems in my codes ? The genome size is 2.5G , with 38 illumina data and 166M protein . 2、I also want to add the PB.data into the model , how can I do it ? I have runned the Isoseq3 to deal with the raw data and obtained the sample.collapsed.gff Any suggestions? Thanks so much! Sincerely Yizhong Huang

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JE2MNRV7LY6Q3SDULDTOZ2JZANCNFSM45JLA5IQ .

Huangyizhong commented 3 years ago

Thanks so much! There were 21 chromosome in the genome fasta file. Yes, I think no results may because the big data! Is there some other methods to do it ?

KatharinaHoff commented 3 years ago

You can edit the following line in braker.pl: image

Increase the chunksize substantially. If the chunksize is larger, it will split into less jobs, producing fewer files.

On Fri, May 21, 2021 at 5:38 PM Yizhong Huang @.***> wrote:

Thanks so much! There were 21 chromosome in the genome fasta file. Yes, I think no results may because the big data! Is there some other methods to do it ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/379#issuecomment-846041808, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JEDW6SD2YMVC4IPQFDTOZ5AVANCNFSM45JLA5IQ .

Huangyizhong commented 3 years ago

ok, I will try it later and thanks so much for your help !