CompSynBioLab-KoreaUniv / FunGAP

FunGAP: fungal Genome Annotation Pipeline
109 stars 33 forks source link

Missing gene in long-read assembly #66

Closed fungal-spore closed 3 years ago

fungal-spore commented 3 years ago

Hello, I've found something that concerns me. We have an isolate that we sequenced with both PacBio and Illumina (for polishing), however we decided to assemble the illumina reads just to see what we got. After running fungap on both assemblies, I've found that there is an important gene which was found in the short read predictions, however was NOT found in the long read predictions. Blast of the long read assembly shows that the gene is present on one of the major chromosome size contigs but didn't get called in the gene prediction. Any idea why this might be the case?

mbnmbn00 commented 3 years ago

That is interesting.

It is possible that the problematic region in your long read assembly isn't correct. For example, there could be a stop codon in the middle so the gene is not called or excluded from the filtering. (Is it a 100% identity sequence match?)

I can look into what exactly happened. You can send me the fungap_out directories of two outputs. I need to know which gene predictor called that gene in the Illumina assembly.

Also, you may look at the RNA-seq support of those regions. You can load the assemblies and BAM files in the IGV.

choilab commented 3 years ago

I agree with BN’s comments.

  1. polish the long read assembly with Miseq reads
  2. check the quality of the long read assembly at nucleotide level with BUSCO.

2021년 5월 11일 (화) 오전 3:53, mbnmbn00 @.***>님이 작성:

That is interesting.

It is possible that the problematic region in your long read assembly isn't correct. For example, there could be a stop codon in the middle so the gene is not called or excluded from the filtering. (Is it a 100% identity sequence match?)

I can look into what exactly happened. You can send me the fungap_out directories of two outputs. I need to know which gene predictor called that gene in the Illumina assembly.

Also, you may look at the RNA-seq support of those regions. You can load the assemblies and BAM files in the IGV.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CompSynBioLab-KoreaUniv/FunGAP/issues/66#issuecomment-837144018, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMQZ2FG2KRZTPJ54C7FZZ3TNATTZANCNFSM44R43XAQ .

-- Professor | CSBL at Korea University | http://www.choilab.org | https://github.com/choilab Director | Institute of Life Science and Natural Resources, Korea University | http://lifenature.korea.ac.kr Chair | Department of Biotechnology, Graduate School | http://biograduate.korea.ac.kr Tel: 02-3290-3152 | Cell: 010-7448-3289