GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data
GNU General Public License v3.0
171 stars 22 forks source link

NDR and novelTranscript columns do not exist in mcols(se) #426

Closed abcdtree closed 1 month ago

abcdtree commented 2 months ago

Hi,

I am running bambu to discovery novel transcripts. However, after I ran bambu. The code line about novel transcripts filtering did not work. se.novel = se[mcols(se)$novelTranscript,]

I checked the data structure of mcols(se) which was used to filter as below:

mcols(se)
DataFrame with 229580 rows and 4 columns
                               TXNAME             GENEID      txid          eqClassById
                          <character>        <character> <integer>        <IntegerList>

There is no novelTranscript column. I tried to adjust NDR parameter, but I found mcols(se)$NDR also return NULL.

This is the bambu command I ran, did I missed any parameter? se <- bambu(reads = bfl, annotations = bambuAnnotations, genome = ref.file, NDR = 1, quant=FALSE)

Thanks

Josh

andredsim commented 2 months ago

Hi Josh,

Have you perhaps saved and reloaded the object as a gtf and then used prepareAnnotations on it. Unfortunately once the new annotations are saved as a gtf some of the metadata is lost, and can't be recovered. To save this information it is better to save the output se object as an rds with saveRDS() and load it back in with loadRDS().

If this isn't the issue perhaps you can try running the test data (see installation in the documentation) to see if you get the NDR and novelTranscript columns in the output. If you do not with your data still let me know.

Finally the input to your annotations parameter bambuAnnotations. Is this just the loaded in reference gtf file using prepareAnnotations, or is this the gtf file from bambu's output?

Kind Regards, Andre Sim

abcdtree commented 1 month ago

Thanks Andre for your patient reply.

The bambuAnnotations is just the reference gtf file (Human GRCh38 genome) I loaded. It is not the extended gtf from bambu output.

gtf.file <- "/home/*/gencode.v35.annotation.gtf"
bambuAnnotations <- prepareAnnotations(gtf.file)
se <- bambu(reads = bfl, annotations = bambuAnnotations, genome = ref.file, NDR = 1,quant=FALSE)

Am I understanding the whole process wrong?

Thanks again,

Josh

abcdtree commented 1 month ago

@andredsim Hi Andre, I ran the test dataset and it gives all the novel transcripts information. I suspect that the gtf annotation I used may have some problem. I will find another one to test again. Thanks for your help.

Josh

andredsim commented 1 month ago

I hope a new gtf file works for you. If it still continues to not work, feel free to reopen this, as we do want the prepareAnnotations() function to be robust to different types of gtf files.