GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data
GNU General Public License v3.0
171 stars 22 forks source link

question about transcriptome/genome #413

Open huawen-poppy opened 5 months ago

huawen-poppy commented 5 months ago

Hi. Thank you for your nice tool!

I have a poorly annotated transcriptome file in my species Aiptasia. This file lacks isoform level annotation and all the isoforms are removed. I am wondering is it possible to use bambu to find all the isoforms and quantify their expression in my DRS data? Or I need to use the genome file to do the reference assemble, which turns out that the identified genes would be different from the genes from my transcriptome ?

Thank you very much!

andredsim commented 5 months ago

Hi,

You have a few options here if you would like to do transcript discovery and quantification:

  1. You can run bambu using denovo mode where you only provide the bam file and the genome fasta file. This will use a pretrained model to perform de novo transcript discovery. https://github.com/GoekeLab/bambu?tab=readme-ov-file#De-novo-transcript-discovery
  2. The above uses a model trained on human data, if you have similar data for another organism with better quality annotations you could train a new model perform de novo transcript discovery with it https://github.com/GoekeLab/bambu?tab=readme-ov-file#Training-a-model-on-another-speciesdataset-and-applying-it
  3. You can first run a transcript prediction software to generate annotations to provide to bambu. The more correct annotations it predicts the better bambu will perform at predicting the remaining annotations. The existence of genes not found in the transcriptome won't significantly hinder transcript discovery. You can then filter out these predictions from the transcript prediction software that did not receive full-length read support once bambu has run.

I do want to caveat all the above however. Bambu was designed to best work on medium to high quality annotated organisms as it uses the annotations to train the transcript discovery model, therefore we have not yet evaluated the capabilities of the de novo mode thoroughly. You may need to play around with the NDR parameter (between 0-1) to get a sensitivity/precision that suits your working needs as this can not be calibrated in de novo mode. As a first attempt if using methods 1 or 2 try an NDR of 0.5, and if using method 3, leave it blank and let bambu predict and NDR first.

Let me know how this goes as I would be very interested in this use case and how we can improve it for future versions of bambu.

Kind Regards, Andre Sim

huawen-poppy commented 5 months ago

Hi Andre,

Thank you for your prompt and detailed response! I appreciate the options you provided for transcript discovery and quantification using Bambu. I will explore these options and play around with the NDR parameter, taking into consideration the specifics of my dataset. I'll keep you updated on my progress and any observations I make during the process.

Cheers, Huawen