PASApipeline / PASApipeline

PASA software
Other
172 stars 58 forks source link

The BUSCO result from the PASA combination is undesired #189

Closed Huangyizhong closed 3 years ago

Huangyizhong commented 3 years ago

Hi, there These day I have asked so many quesion about the PASA and say sorry for it. I have used the BUSCO to evaluate the cds.fa file from the PB data and the BUSCO was 91%. I wan to use the PASA to combine the RNA data , but there are something wrong. The BUSCO results was 8.5% and I was so sad. The RNA process codes are listing as follows: 1、PB data from the Isoseq pipeline 2、genome-guided illumina data (hista2+stringtie+TACO) 3、deno-assembly from the Trinity /home/softwares/trinityrnaseq-v2.12.0/Trinity --seqType fq --max_memory 900G \ --left ${sample_dir}/sample1_P1.fastq.gz,${sample_dir}/sample2_P1.fastq.gz --right {sample_dir}/sample1_P2.fastq.gz,${sample_dir}/sample2_P1.fastq.gz --CPU 80 \ --no_parallel_norm_stats 4、PASA combination cat Trinity.fasta sample.collapsed.rep.fa > transcripts.fasta /usr/local/src/PASApipeline/misc_utilities/accession_extractor.pl tdn.accs for i in seq 1 18 /home/softwares/PASApipeline.v2.4.1/Launch_PASA_pipeline.pl \ -c sqlite.confs/alignAssembly.config \ -C -R -g sample_softmasked_chr${i}.fa \ -t transcripts.fasta \ --TDN tdn.accs \ --trans_gtf chr${i}_assembly.gtf \ --ALIGNERS blat \ --CPU 10 5、Combieing the sample_mydb_pasa.sqlite.pasa_assemblies.gff3 and sample_mydb_pasa.sqlite.assemblies.fasta from all the chrosomes and use the transcoder to obtain the pasa_merge_assemblies.fasta.transdecoder.genome.gff3 and pasa_merge_assemblies.fasta.transdecoder.pep.fa Then I used the BUSCO to evaluate the pasa_merge_assemblies.fasta.transdecoder.pep.fa. The BUSCO is 8.5% Any suggestion to deal with it ? Sorry to disturb you so many times! Sincerely Yizhong Huang

brianjohnhaas commented 3 years ago

Hi,

I think there might be a couple problems here. First, when you run PASA, run it once using all the data instead of running each chromosome separately.

If you do want to run each chromosome separately for some reason (even though I don't recommend it), you'd need to create separate pasa sqlite databases for each - and so each would need to have a separate alignAssembly.config file with a different database name.

If it turns out that the above wasn't actually a problem for you, then you might run BUSCO using the pasa assemblies transcriptome fasta file as input and not the .pep file. You might try doing this first before rerunning pasa just to see if it already accounts for the unexpectedly low busco results.

hope this helps

On Mon, May 17, 2021 at 11:09 AM Yizhong Huang @.***> wrote:

Hi, there These day I have asked so many quesion about the PASA and say sorry for it. I have used the BUSCO to evaluate the cds.fa file from the PB data and the BUSCO was 91%. I wan to use the PASA to combine the RNA data , but there are something wrong. The BUSCO results was 8.5% and I was so sad. The RNA process codes are listing as follows: 1、PB data from the Isoseq pipeline 2、genome-guided illumina data (hista2+stringtie+TACO) 3、deno-assembly from the Trinity /home/softwares/trinityrnaseq-v2.12.0/Trinity --seqType fq --max_memory 900G --left ${sample_dir}/sample1_P1.fastq.gz,${sample_dir}/sample2_P1.fastq.gz --right {sample_dir}/sample1_P2.fastq.gz,${sample_dir}/sample2_P1.fastq.gz --CPU 80 --no_parallel_norm_stats 4、PASA combination cat Trinity.fasta sample.collapsed.rep.fa > transcripts.fasta /usr/local/src/PASApipeline/misc_utilities/accession_extractor.pl

tdn.accs for i in seq 1 18 /home/softwares/PASApipeline.v2.4.1/Launch_PASA_pipeline.pl -c sqlite.confs/alignAssembly.config -C -R -g sample_softmasked_chr${i}.fa -t transcripts.fasta --TDN tdn.accs --trans_gtf chr${i}_assembly.gtf --ALIGNERS blat --CPU 10 5、Combieing the sample_mydb_pasa.sqlite.pasa_assemblies.gff3 and sample_mydb_pasa.sqlite.assemblies.fasta from all the chrosomes and use the transcoder to obtain the pasa_merge_assemblies.fasta.transdecoder.genome.gff3 and pasa_merge_assemblies.fasta.transdecoder.pep.fa Then I used the BUSCO to evaluate the pasa_merge_assemblies.fasta.transdecoder.pep.fa. The BUSCO is 8.5% Any suggestion to deal with it ? Sorry to disturb you so many times! Sincerely Yizhong Huang — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or unsubscribe .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas